Reproducibility and Transparency in a Digital World


Dr. James Doss-Gollin

Thu., Sep. 19

Academic Integrity

Today

  1. Academic Integrity

  2. Citation Management with Zotero

  3. Reproducibility and Transparency

Rice Honor Code

http://honor.rice.edu/

On my honor, I have neither given nor received any unauthorized aid on this (exam, quiz, paper).

Plagiarism Spectrum 2.0 Turnitin. (2021).

Plagiarism in CEVE 101

  1. I expect that you are still learning proper citation practices, and I aim to support you in this learning
  2. Aim for honesty and integrity always
    • Example: citing sources using a URL in a footnote

Integrity

Credit

Shoutout to my colleague Vivek Srikrishnan and Tony Wong for these examples

Example: Academic Integrity

Dan searches the internet for relevant code and copy-pastes it into his Jupyter notebook. He properly cites the source of the codes.

Probably Not OK:

  • What portion of the work is Dan’s?
  • How important were the copied codes?
  • Did Dan understand the code he copied?

Example: Academic Integrity

Matthew and Rhonda work together to figure out how to implement the codes, but each works on their own computer and develops their own software.

Definitely OK:

  • Matthew and Rhonda have collaborated to understand how to solve the problem(s), but has written up their own solution, demonstrating their understanding.

Example: Academic Integrity

Felix and Rachel are working together on a problem involving a derivation. Rachel types it up in LaTeX and sends the code to Felix, who pastes it into his Jupyter notebook.

Likely Not OK:

  • Did Felix contribute enough to the derivation?
  • Definitely not OK if Felix doesn’t give Rachel credit for her contribution.

LLMs and AI

From Syllabus:

  1. LLMs don’t do your work for you
  2. LLMs don’t do your thinking for you
  3. If you replace Rachel or “the internet” in the previous examples with “Claude” or “ChatGPT”, the answer is still the same!

LLMs and AI

There are many helpful and productive uses of AI!

  1. Structured feedback on your work
  2. Debugging code
  3. Translating ideas between computational representations

LLM example:

LLMs (and other specialized tools) can help you convert text- or image-based representations of an equation into a more readily shareable format

\frac{\partial \mathbf{u}}{\partial t}
+ \underbrace{(\mathbf{u} \cdot \nabla)\mathbf{u}}_{\text{convection}}
= -\frac{1}{\rho}\nabla p
+ \underbrace{\nu \nabla^2\mathbf{u}}_{\text{diffusion}}
+ \overbrace{\mathbf{g}}^{\text{body forces}}

\[ \frac{\partial \mathbf{u}}{\partial t} + \underbrace{(\mathbf{u} \cdot \nabla)\mathbf{u}}_{\text{convection}} = -\frac{1}{\rho}\nabla p + \underbrace{\nu \nabla^2\mathbf{u}}_{\text{diffusion}} + \overbrace{\mathbf{g}}^{\text{body forces}} \]

Citation Management with Zotero

Today

  1. Academic Integrity

  2. Citation Management with Zotero

  3. Reproducibility and Transparency

Reference Management

  1. Collect
  2. Organize
  3. Cite
  4. Write

Why Zotero?

There are many tools (EndNote, Mendeley, Papers, PaperPile, etc.) but Zotero stands out

  • Free
  • Open source (no lock-in)
  • Cross platform
  • Cloud sync (free for you)
  • Many export formats

Quick demos

  1. Add a paper to library using web connector
  2. Add a paper to library using DOI
  3. Manually add a report
  4. Cite a source in Google Docs (ref)

Reproducibility and Transparency

Today

  1. Academic Integrity

  2. Citation Management with Zotero

  3. Reproducibility and Transparency

Definitions

  1. Reproducibility: Ability to generate the same insights using the same data, code, and methods.
  2. Repeatability: The ability to run the same experiment or analysis in the same lab or environment and get the same results.
  3. Replicability: Achieving similar results using different methods, datasets, or variations on the original study.
  4. Transparency: Ensuring all data, code, and methodology are openly available and clearly documented.

Importance

  1. Reproducibility ensures that other researchers can verify the results of a study using the same data and approach.
  2. Repeatability confirms that the original experiment was performed accurately and consistently.
  3. Replicability strengthens findings by validating them under different conditions or with different data.
  4. Transparency builds trust in research by making it possible for others to scrutinize the processes and decisions behind the results.

Relevance

These analyses are especially relevant for infrastructure planning and climate risk management

  1. Small changes in underlying assumptions (i.e., details) can have large impacts on outcomes
    • Not possible to describe every single detail using text
    • Different assumptions can look similar on past data but yield very different conclusions about the future
  2. Many assumptions are impossible to verify objectively

Are there examples of research where these issues are less important?

How to Improve Openness

Openness is less a strict, well-defined standard than a direction to work towards as we acquire new tools and skills (Always Be Improving). For example, instead of doing analysis in Excel and sharing your findings with a report, you could:

  1. Do calculations in Excel and share the spreadsheet with the report
  2. Do analysis with computational scripts that others can examine
  3. Share all input and output data as part of the analysis
  4. Use open-source tools with a well-documented environment so others can replicate your software stack
  5. Integrate pipeline tools so that others can run your analysis with a single command
  6. Combine your code and analysis into a single notebook
  7. Write clear, readable code with comments and documentation
  8. Integrate automated tests to increase confidence in your numerical implementation

Wrapup

  1. Feedback on project 1 one-pagers today
  2. Project 1 templates
  3. Arts + Engineering Undergrad Social tomorrow 9/20