Hopf Paper Case Study
This chapter explains how the wrapper can be used in a real reproduction task.
The example is the hemp paper:
Hopf et al. (2025), "Adaptation of the process-based CSM-CROPGRO model to simulate the growth and development of industrial hemp for seed and fiber production."
Why this case study matters
It is a good teaching example because it combines:
- a real publication
- real observed data files
- real genotype coefficients
- a model family that required install metadata to be consistent
- a workflow that can later inform work in other model frameworks
The four key experiment files
In the local reproduction workspace, the exact paper experiment files came from
dssat-csm-data/Hemp:
UFCI2101.HMXUFCI2201.HMXUFJA2101.HMXUFJA2201.HMX
Together they define the 15 Florida cases listed in the paper's experiment table.
The matching observation files
Each experiment also had:
- a time-course observation file
- an anthesis or summary observation file
For hemp these appeared as:
.HMT.HMA
That is exactly the kind of structure the wrapper's observation-reading logic needs to support if it is going to help with paper reproduction.
Why the observation-path fix mattered
Originally, the new wrapper could read observations only from the installed DSSAT crop folder.
That was not enough for this paper workflow, because the exact paper experiment files and observations lived in an external project directory.
The wrapper was updated so that observation files can be found relative to the
project_file directory when needed.
That small technical change is a good example of how real case studies improve general wrapper design.
First-pass reproduction logic
The reproduction workflow does this:
- define the 15 paper cases explicitly
- run each case through
DSSAT_omniwrapper() - read the matching observations
- join simulated and observed values on date
- compute first-pass metrics such as
dandRMSE - compare observed and simulated flowering dates
What the first pass can and cannot claim
It can claim:
- the paper experiments rerun successfully
- the observed and simulated data can be aligned reproducibly
- the resulting performance metrics are strong enough to justify deeper analysis
It should not yet claim:
- that every published metric has been matched exactly
- that every figure in the paper has been rebuilt identically
- that every aggregation choice used by the authors has already been reproduced
That distinction is part of good scientific communication.
Why this case study belongs in the book
Because it shows the difference between:
- building a wrapper in theory
- building one that survives contact with a real publication workflow
That is the level of evidence that helps a community trust a tool.