« If open science is done transparently, it doesn’t have to be done perfectly. You can always refine as you go. »

« If open science is done transparently, it doesn’t have to be done perfectly. You can always refine as you go. »
Last year at the Open Science Retreat (#OSR24NL) I have been introduced to nanopubs by @egonw and created my first nanopub declaring citations for a paper using CiTOs (citation ontologies).
Now, travelling to #OSR25CH, due to issues with the train network foreseen with plenty of time, I used the opportunity and created a new one ( https://w3id.org/np/RAicq7k9QHX8EG8ho7Baib9GxHsU18O0tyFCZ4tbbGGPA ) for my latest publication on teaching #Snakemake on #HPC systems.
The teaching material is — again — in desperate need of additions and overhaul, but that is for another day.
If you’re into #ReproducibleResearch & #OpenScience, don’t miss this MOOC
https://scholar.social/@khinsen/114314471616473600
Lots of practical tools for computational reproducibility, including the unequaled #Guix :-), all this brought to you by experts in the field, starting with @khinsen.
Reading report 182 by Institute for replication:
« While demand estimates perfectly replicate, the production function estimates of Hong and Luparello (2024) differ slightly from those in Orr (2022) to the second decimal.
Hong and Luparello (2024) argue this due to numerical instability of STATA’s matrix inversion when recovering marginal cost, which I believe is correct.
The documentation for the particular command used for the marginal cost inversion — luinv in STATA — notes that different computers can give different results when a matrix is close to singular. »
Ok, now I would like to inspect the numerical method behind ’luinv’.
Doc reads « This function uses the MKL LAPACK by default. » … but for some cases Netlib LAPACK could also be used. Hum?!
Which LAPACK version? Compiled using which options? In other words, which computational environment?
https://i4replication.org
https://www.stata.com/manuals/m-5luinv.pdf
Do you know Stata?
« a general-purpose statistical proprietary software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fields, including biomedicine, economics, epidemiology, and sociology. »
Guess how much does it cost? $USD ~1000 per year! Option government/nonprofit single-user. Bang!
And guess what?
For this price, you have only the “fast“ option, not “twice as fast”, neither “almost four times as fast” or neither even “even faster”. Ahah!
Crazy to read we’re still there in 2025 about #ReproducibleResearch and #OpenScience in some academic fields…
https://en.wikipedia.org/wiki/Stata
https://www.stata.com/order/new/gov/single-user-licenses/dl/
« Mass Reproducibility and Replicability: A New Hope »
Reproduire en masse – grâce aux « reproducible games » – pour changer les normes… Approche de l’Institute for Replication qui interpelle !
Dans le cadre du réseau français de Recherche Reproducible, le Groupe de Travail “Environnement Computationnel” vient d’ouvrir la rédaction collaborative de « fiches ».
Logiciel : identifier quoi ? utiliser où ? développer comment ?
L’idée est d’avoir comme une boussole pour s’orienter dans les questions et ressources déjà existantes autour du “logiciel” et de la “reproductibilité”.
Comment contribuer ? Relecture, retours, ressources, rédaction, … Toutes vos idées quoi !
https://gt-env-logiciels.gricad-pages.univ-grenoble-alpes.fr/sandbox-notecards
Le dépôt: https://gricad-gitlab.univ-grenoble-alpes.fr/gt-env-logiciels/sandbox-notecards
Dans le cadre du réseau français de Recherche Reproducible, le Groupe de Travail “Environnement Computationnel” vient d’ouvrir la rédaction collaborative de « fiches ».
Logiciel : identifier quoi ? utiliser où ? développer comment ?
L’idée est d’avoir comme une boussole pour s’orienter dans les questions et ressources déjà existantes autour du “logiciel” et de la “reproductibilité”.
Comment contribuer ? Relecture, retours, ressources, rédaction, … Toutes vos idées quoi !
https://gt-env-logiciels.gricad-pages.univ-grenoble-alpes.fr/sandbox-notecards
Le dépôt: https://gricad-gitlab.univ-grenoble-alpes.fr/gt-env-logiciels/sandbox-notecards
Dans le cadre du réseau français de Recherche Reproducible, le Groupe de Travail “Environnement Computationnel” vient d’ouvrir la rédaction collaborative de « fiches ».
Logiciel : identifier quoi ? utiliser où ? développer comment ?
L’idée est d’avoir comme une boussole pour s’orienter dans les questions et ressources déjà existantes autour du “logiciel” et de la “reproductibilité”.
Comment contribuer ? Relecture, retours, ressources, rédaction, … Toutes vos idées quoi !
https://gt-env-logiciels.gricad-pages.univ-grenoble-alpes.fr/sandbox-notecards
Le dépôt: https://gricad-gitlab.univ-grenoble-alpes.fr/gt-env-logiciels/sandbox-notecards
Ce qui est toujours passionant aux journées du Réseau Français de la Recherche Reproductible, c’est l’inter-disciplinarité.
Hier aprês des ouvertures sur les pratiques éditoriales, des présentations autour de l’analyse d'images IRM et de la recherche pré-clinique.
Puis discussion avec un café et des pralines sur la géomatique avec @NRoelandt.
Ce matin, une présentation sur la Science Ouverte en archéologie pré-historique puis ensuite une sur l’astrophysique.
Une belle stimulation par la pollinisation croisée.
https://jrfrr-2025.sciencesconf.org/?lang=fr
#ReproducibleResearch #OpenScience
As of 2019, less than 25% of the papers in ecology & evolution came with their data; less than 20% came with their code. Ouch.
Content d’être à Lyon pour les journées du Réseau Français de la Recherche Reproductible.
Riche programme !
https://jrfrr-2025.sciencesconf.org/?lang=fr
#ReproducibleResearch #OpenScience
I recently did a live demo of a "reproducibility audit" for @us_rse. Check it out: https://www.youtube.com/watch?v=Q2ZsLbBkWrk
Ugh, #SnakeMake apparently has a hard requirement of all input (and previously-made output-) files to be present on-disk. Can't even do a dry-run `snakemake -n`.
Today is the day of closed pull request for #Snakemake. The #SnakemakeHackathon2025 participants worked at full speed!
We decided to write a white-paper summarizing our achievements rather than posting individual things. Suffice to say, that also the documentation made a great leap towards better readability!
#SnakemakeHackathon2025 ! We started!
At the CERN for better #ReproducibleComputing and #ReproducibleResearch .
« Revisiting my PhD dissertation after 4 years
or Why I wanted to make my dissertation reproducible »
Regret 5: Not keeping track of dependencies
« Dependencies are not only about the R packages. Some R packages require certain software to be installed on the OS, which are called system requirements. For example, the ggplot2 package requires clang++ (C++ compiler), which usually comes with an operating system. The situation gets complex when installing a package with multiple package dependencies that require different system tools. For example, installing the kableExtra package requires the svgLite and xml2 packages that require libpng and libxml2 as system requirements, respectively. So, I had to deal with the system requirements of the 240+ packages in addition to installing those packages. The process was time-consuming. »
Maybe a perfect job for #Guix?
Applying the FAIR Principles to computational workflows
I will continue to find it disturbing, if new #HPC cluster users explicitly instruct a program to use one core/cpu only and then complain that the cluster is so slow. Slower than their basement server.
Usually they do not spot their mistake on their own.
But THIS is actually NOT the disturbing part: such users also tend to always use default parameters. This might or might not be the sensible thing to do for their problem. Also, when reading papers, software parameterization frequently is not reported.
We have a long way to go.
break? #Guix #HPC activity report is out!
Check it out:
https://hpc.guix.info/blog/2025/02/guix-hpc-activity-report-2024/
The seventh report, time flies! Highlight of key achievements:
• More than 57,000 packages, Guix + related channels for scientific packages;
• Performance and portability: more about MPI;
• ROCm/HIP software stack for AMD GPUs upgrades;
• Migration of Guix-Science channel to Codeberg;
• New version of Guix-Jupyter;
• Common Workflow Language #CWL + Guix = : ccwl and ravanan;
• Ensuring Source Code Availability: #SoftwareHeritage rescue mode;
• Re-Deploying Software from the Past;
• Supporting Artifact Evaluation at SuperComputing 2024;
• “Package to Container image” conversion pipeline, #DIAMOND and #DIADEM;
• Digital Electronics Design;
• Reproducible Multiphysics Simulation and Workflows;
• Impact of Hardware Variability;
• Toward Guix on French Tier-1 and EuroHPC Tier-0 machines;
• Pangenome Genetics Research
• Supporting RISC-V
• List of articles, talks, tutorials, events, training sessions.
• … and more…
What a year, isn’t it?