Collection

Analysis

Deriving meaning and knowledge from data. Software, code, licensing, maintenance, statistics, methods, code sharing, documentation, and more.

109 affiliated resources

Open filters Close filters

Analysis of Open Data and Computational Reproducibility in Registered Reports in Psychology

Unrestricted Use

Public Domain

Analysis of Open Data and Computational Reproducibility in Registered Reports in Psychology

Rating

Ongoing technological developments have made it easier than ever before for scientists to share their data, materials, and analysis code. Sharing data and analysis code makes it easier for other researchers to re-use or check published research. These benefits will only emerge if researchers can reproduce the analysis reported in published articles, and if data is annotated well enough so that it is clear what all variables mean. Because most researchers have not been trained in computational reproducibility, it is important to evaluate current practices to identify practices that can be improved. We examined data and code sharing, as well as computational reproducibility of the main results, without contacting the original authors, for Registered Reports published in the psychological literature between 2014 and 2018. Of the 62 articles that met our inclusion criteria, data was available for 40 articles, and analysis scripts for 37 articles. For the 35 articles that shared both data and code and performed analyses in SPSS, R, Python, MATLAB, or JASP, we could run the scripts for 31 articles, and reproduce the main results for 20 articles. Although the articles that shared both data and code (35 out of 62, or 56%) and articles that could be computationally reproduced (20 out of 35, or 57%) was relatively high compared to other studies, there is clear room for improvement. We provide practical recommendations based on our observations, and link to examples of good research practices in the papers we reproduced.

Subject:: Psychology; Social Science
Material Type:: Reading
Author:: Daniel Lakens; Jaroslav Gottfried; Nicholas Alvaro Coles; Pepijn Obels; Seth Ariel Green
Date Added:: 08/07/2020

Análisis y visualización de datos usando Python

Unrestricted Use

CC BY

Análisis y visualización de datos usando Python

Rating

Python es un lenguaje de programación general que es útil para escribir scripts para trabajar con datos de manera efectiva y reproducible. Esta es una introducción a Python diseñada para participantes sin experiencia en programación. Estas lecciones pueden enseñarse en un día (~ 6 horas). Las lecciones empiezan con información básica sobre la sintaxis de Python, la interface de Jupyter Notebook, y continúan con cómo importar archivos CSV, usando el paquete Pandas para trabajar con DataFrames, cómo calcular la información resumen de un DataFrame, y una breve introducción en cómo crear visualizaciones. La última lección demuestra cómo trabajar con bases de datos directamente desde Python. Nota: los datos no han sido traducidos de la versión original en inglés, por lo que los nombres de variables se mantienen en inglés y los números de cada observación usan la sintaxis de habla inglesa (coma separador de miles y punto separador de decimales).

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Alejandra Gonzalez-Beltran; April Wright; Christopher Erdmann; Enric Escorsa O'Callaghan; Erin Becker; Fernando Garcia; Hely Salgado; Juan M. Barrios; Juan Martín Barrios; Katrin Leinweber; LUS24; Laura Angelone; Leonardo Ulises Spairani; Maxim Belkin; Miguel González; Nicolás Palopoli; Nohemi Huanca Nunez; Paula Andrea Martinez; Raniere Silva; Rayna Harris; Sarah Brown; Silvana Pereyra; Spencer Harris; Stephan Druskat; Trevor Keller; Wilson Lozano; chekos; monialo2000; rzayas
Date Added:: 08/07/2020

Automation and Make

Unrestricted Use

CC BY

Automation and Make

Rating

A Software Carpentry lesson to learn how to use Make Make is a tool which can run commands to read files, process these files in some way, and write out the processed files. For example, in software development, Make is used to compile source code into executable programs or libraries, but Make can also be used to: run analysis scripts on raw data files to get data files that summarize the raw data; run visualization scripts on data files to produce plots; and to parse and combine text files and plots to create papers. Make is called a build tool - it builds data files, plots, papers, programs or libraries. It can also update existing files if desired. Make tracks the dependencies between the files it creates and the files used to create these. If one of the original files (e.g. a data file) is changed, then Make knows to recreate, or update, the files that depend upon this file (e.g. a plot). There are now many build tools available, all of which are based on the same concepts as Make.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Adam Richie-Halford; Ana Costa Conrado; Andrew Boughton; Andrew Fraser; Andy Kleinhesselink; Andy Teucher; Anna Krystalli; Bill Mills; Brandon Curtis; David E. Bernholdt; Deborah Gertrude Digges; François Michonneau; Gerard Capes; Greg Wilson; Jake Lever; Jason Sherman; John Blischak; Jonah Duckles; Juan F Fung; Kate Hertweck; Lex Nederbragt; Luiz Irber; Matthew Thomas; Michael Culshaw-Maurer; Mike Jackson; Pete Bachant; Piotr Banaszkiewicz; Radovan Bast; Raniere Silva; Rémi Emonet; Samuel Lelièvre; Satya Mishra; Trevor Bekolay
Date Added:: 03/20/2017

Bayesian inference for psychology. Part II: Example applications with JASP

Unrestricted Use

CC BY

Bayesian inference for psychology. Part II: Example applications with JASP

Rating

Bayesian hypothesis testing presents an attractive alternative to p value hypothesis testing. Part I of this series outlined several advantages of Bayesian hypothesis testing, including the ability to quantify evidence and the ability to monitor and update this evidence as data come in, without the need to know the intention with which the data were collected. Despite these and other practical advantages, Bayesian hypothesis tests are still reported relatively rarely. An important impediment to the widespread adoption of Bayesian tests is arguably the lack of user-friendly software for the run-of-the-mill statistical problems that confront psychologists for the analysis of almost every experiment: the t-test, ANOVA, correlation, regression, and contingency tables. In Part II of this series we introduce JASP (http://www.jasp-stats.org), an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems. JASP is based in part on the Bayesian analyses implemented in Morey and Rouder’s BayesFactor package for R. Armed with JASP, the practical advantages of Bayesian hypothesis testing are only a mouse click away.

Subject:: Psychology; Social Science
Material Type:: Reading
Provider:: Psychonomic Bulletin & Review
Author:: Akash Raj; Alexander Etz; Alexander Ly; Alexandra Sarafoglou; Bruno Boutin; Damian Dropmann; Don van den Bergh; Dora Matzke; Eric-Jan Wagenmakers; Erik-Jan van Kesteren; Frans Meerhoff; Helen Steingroever; Jeffrey N. Rouder; Johnny van Doorn; Jonathon Love; Josine Verhagen; Koen Derks; Maarten Marsman; Martin Šmíra; Patrick Knight; Quentin F. Gronau; Ravi Selker; Richard D. Morey; Sacha Epskamp; Tahira Jamil; Tim de Jong
Date Added:: 08/07/2020

Being a Reviewer or Editor for Registered Reports

Unrestricted Use

CC BY

Being a Reviewer or Editor for Registered Reports

Rating

Experienced Registered Reports editors and reviewers come together to discuss the format and best practices for handling submissions. The panelists also share insights into what editors are looking for from reviewers as well as practical guidelines for writing a Registered Report. ABOUT THE PANELISTS: Chris Chambers | Chris is a professor of cognitive neuroscience at Cardiff University, Chair of the Registered Reports Committee supported by the Center for Open Science, and one of the founders of Registered Reports. He has helped establish the Registered Reports format for over a dozen journals. Anastasia Kiyonaga | Anastasia is a cognitive neuroscientist who uses converging behavioral, brain stimulation, and neuroimaging methods to probe memory and attention processes. She is currently a postdoctoral researcher with Mark D'Esposito in the Helen Wills Neuroscience Institute at the University of California, Berkeley. Before coming to Berkeley, she received her Ph.D. with Tobias Egner in the Duke Center for Cognitive Neuroscience. She will be an Assistant Professor in the Department of Cognitive Science at UC San Diego starting January, 2020. Jason Scimeca | Jason is a cognitive neuroscientist at UC Berkeley. His research investigates the neural systems that support high-level cognitive processes such as executive function, working memory, and the flexible control of behavior. He completed his Ph.D. at Brown University with David Badre and is currently a postdoctoral researcher in Mark D'Esposito's Cognitive Neuroscience Lab. Moderated by David Mellor, Director of Policy Initiatives for the Center for Open Science.

Subject:: Applied Science; Computer Science; Information Science
Material Type:: Lecture
Provider:: Center for Open Science
Author:: Center for Open Science
Date Added:: 08/07/2020

Carpentries Instructor Training

Unrestricted Use

CC BY

Carpentries Instructor Training

Rating

A two-day introduction to modern evidence-based teaching practices, built and maintained by the Carpentry community.

Subject:: Applied Science; Computer Science; Education; Higher Education; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Aleksandra Nenadic; Alexander Konovalov; Alistair John Walsh; Allison Weber; Amy E. Hodge; Andrew B. Collier; Anita Schürch; AnnaWilliford; Ariel Rokem; Brian Ballsun-Stanton; Callin Switzer; Christian Brueffer; Christina Koch; Christopher Erdmann; Colin Morris; Dan Allan; DanielBrett; Danielle Quinn; Darya Vanichkina; David Jennings; Eric Jankowski; Erin Alison Becker; Evan Peter Williamson; François Michonneau; Gerard Capes; Greg Wilson; Ian Lee; Jason M Gates; Jason Williams; Jeffrey Oliver; Joe Atzberger; John Bradley; John Pellman; Jonah Duckles; Jonathan Bradley; Karen Cranston; Karen Word; Kari L Jordan; Katherine Koziar; Katrin Leinweber; Kees den Heijer; Laurence; Lex Nederbragt; Maneesha Sane; Marie-Helene Burle; Mik Black; Mike Henry; Murray Cadzow; Neal Davis; Neil Kindlon; Nicholas Tierney; Nicolás Palopoli; Noah Spies; Paula Andrea Martinez; Petraea; Rayna Michelle Harris; Rémi Emonet; Rémi Rampin; Sarah Brown; Sarah M Brown; Sarah Stevens; Sean; Serah Anne Njambi Kiburu; Stefan Helfrich; Steve Moss; Stéphane Guillou; Ted Laderas; Tiago M. D. Pereira; Toby Hodges; Tracy Teal; Yo Yehudi; amoskane; davidbenncsiro; naught101; satya-vinay
Date Added:: 08/07/2020

Connecting Research Tools to the Open Science Framework (OSF)

Unrestricted Use

CC BY

Connecting Research Tools to the Open Science Framework (OSF)

Rating

This webinar (recorded Sept. 27, 2017) introduces how to connect other services as add-ons to projects on the Open Science Framework (OSF; https://osf.io). Connecting services to your OSF projects via add-ons enables you to pull together the different parts of your research efforts without having to switch away from tools and workflows you wish to continue using. The OSF is a free, open source web application built to help researchers manage their workflows. The OSF is part collaboration tool, part version control software, and part data archive. The OSF connects to popular tools researchers already use, like Dropbox, Box, Github and Mendeley, to streamline workflows and increase efficiency.

Subject:: Applied Science; Computer Science; Information Science
Material Type:: Lecture
Provider:: Center for Open Science
Author:: Center for Open Science
Date Added:: 08/07/2020

Consequences of Low Statistical Power

Unrestricted Use

CC BY

Consequences of Low Statistical Power

Rating

This video will go over three issues that can arise when scientific studies have low statistical power. All materials shown in the video, as well as the content from our other videos, can be found here: https://osf.io/7gqsi/

Subject:: Applied Science; Computer Science; Information Science
Material Type:: Lecture
Provider:: Center for Open Science
Author:: Center for Open Science
Date Added:: 08/07/2020

Data Analysis and Visualization in Python for Ecologists

Unrestricted Use

CC BY

Data Analysis and Visualization in Python for Ecologists

Rating

Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data. This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in one and a half days (~ 10 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Maxim Belkin; Tania Allard
Date Added:: 03/20/2017

Data Analysis and Visualization in R for Ecologists

Unrestricted Use

CC BY

Data Analysis and Visualization in R for Ecologists

Rating

Data Carpentry lesson from Ecology curriculum to learn how to analyse and visualise ecological data in R. Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with ecology data in R. This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from R.

Subject:: Applied Science; Computer Science; Ecology; Information Science; Life Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Ankenbrand, Markus; Arindam Basu; Ashander, Jaime; Bahlai, Christie; Bailey, Alistair; Becker, Erin Alison; Bledsoe, Ellen; Boehm, Fred; Bolker, Ben; Bouquin, Daina; Burge, Olivia Rata; Burle, Marie-Helene; Carchedi, Nick; Chatzidimitriou, Kyriakos; Chiapello, Marco; Conrado, Ana Costa; Cortijo, Sandra; Cranston, Karen; Cuesta, Sergio Martínez; Culshaw-Maurer, Michael; Czapanskiy, Max; Daijiang Li; Dashnow, Harriet; Daskalova, Gergana; Deer, Lachlan; Direk, Kenan; Dunic, Jillian; Elahi, Robin; Fishman, Dmytro; Fouilloux, Anne; Fournier, Auriel; Gan, Emilia; Goswami, Shubhang; Guillou, Stéphane; Hancock, Stacey; Hardenberg, Achaz Von; Harrison, Paul; Hart, Ted; Herr, Joshua R.; Hertweck, Kate; Hodges, Toby; Hulshof, Catherine; Humburg, Peter; Jean, Martin; Johnson, Carolina; Johnson, Kayla; Johnston, Myfanwy; Jordan, Kari L; K. A. S. Mislan; Kaupp, Jake; Keane, Jonathan; Kerchner, Dan; Klinges, David; Koontz, Michael; Leinweber, Katrin; Lepore, Mauro Luciano; Li, Ye; Lijnzaad, Philip; Lotterhos, Katie; Mannheimer, Sara; Marwick, Ben; Michonneau, François; Millar, Justin; Moreno, Melissa; Najko Jahn; Obeng, Adam; Odom, Gabriel J.; Pauloo, Richard; Pawlik, Aleksandra Natalia; Pearse, Will; Peck, Kayla; Pederson, Steve; Peek, Ryan; Pletzer, Alex; Quinn, Danielle; Rajeg, Gede Primahadi Wijaya; Reiter, Taylor; Rodriguez-Sanchez, Francisco; Sandmann, Thomas; Seok, Brian; Sfn_brt; Shiklomanov, Alexey; Shivshankar Umashankar; Stachelek, Joseph; Strauss, Eli; Sumedh; Switzer, Callin; Tarkowski, Leszek; Tavares, Hugo; Teal, Tracy; Theobold, Allison; Tirok, Katrin; Tylén, Kristian; Vanichkina, Darya; Voter, Carolyn; Webster, Tara; Weisner, Michael; White, Ethan P; Wilson, Earle; Woo, Kara; Wright, April; Yanco, Scott; Ye, Hao
Date Added:: 03/20/2017

Data Analysis and Visualization with Python for Social Scientists

Unrestricted Use

CC BY

Data Analysis and Visualization with Python for Social Scientists

Rating

Python is a general purpose programming language that is useful for writing scripts to work effectively and reproducibly with data. This is an introduction to Python designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about Python syntax, the Jupyter notebook interface, and move through how to import CSV files, using the pandas package to work with data frames, how to calculate summary information from a data frame, and a brief introduction to plotting. The last lesson demonstrates how to work with databases directly from Python.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Geoffrey Boushey; Stephen Childs
Date Added:: 08/07/2020

Data Carpentry for Biologists

Unrestricted Use

CC BY

Data Carpentry for Biologists

Rating

The Biology Semester-long Course was developed and piloted at the University of Florida in Fall 2015. Course materials include readings, lectures, exercises, and assignments that expand on the material presented at workshops focusing on SQL and R.

Subject:: Applied Science; Biology; Computer Science; Information Science; Life Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Ethan White; Zachary Brym
Date Added:: 08/07/2020

Data Cleaning and Management Using OpenRefine

Conditional Remix & Share Permitted

CC BY-NC

Data Cleaning and Management Using OpenRefine

Rating

Course materials on using OpenRefine, a powerful tool for cleaning and transforming tabular data.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Activity/Lab
Provider:: New York University
Author:: Nick Wolf; Vicky Steeves
Date Added:: 02/12/2019

Data Cleaning with OpenRefine for Ecologists

Unrestricted Use

CC BY

Data Cleaning with OpenRefine for Ecologists

Rating

A part of the data workflow is preparing the data for analysis. Some of this involves data cleaning, where errors in the data are identified and corrected or formatting made consistent. This step must be taken with the same care and attention to reproducibility as the analysis. OpenRefine (formerly Google Refine) is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another. This lesson will teach you to use OpenRefine to effectively clean and format data and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Cam Macdonell; Deborah Paul; Phillip Doehle; Rachel Lombardi
Date Added:: 03/20/2017

Data Intro for Archivists

Unrestricted Use

CC BY

Data Intro for Archivists

Rating

This Library Carpentry lesson introduces archivists to working with data. At the conclusion of the lesson you will: be able to explain terms, phrases, and concepts in code or software development; identify and use best practice in data structures; use regular expressions in searches.

Subject:: Applied Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: James Baker; Jeanine Finn; Jenny Bunn; Katherine Koziar; Noah Geraci; Scott Peterson
Date Added:: 08/07/2020

Data Management & Reproducibility

Conditional Remix & Share Permitted

CC BY-NC

Data Management & Reproducibility

Rating

Introduction to data management and reproducibility for researchers as a presentation.

Subject:: Applied Science; Life Science; Physical Science; Social Science
Material Type:: Lesson
Provider:: New York University
Author:: Vicky Steeves
Date Added:: 04/04/2019

Data Management with SQL for Ecologists

Unrestricted Use

CC BY

Data Management with SQL for Ecologists

Rating

Databases are useful for both storing and using data effectively. Using a relational database serves several purposes. It keeps your data separate from your analysis. This means there’s no risk of accidentally changing data when you analyze it. If we get new data we can rerun a query to find all the data that meets certain criteria. It’s fast, even for large amounts of data. It improves quality control of data entry (type constraints and use of forms in Access, Filemaker, etc.) The concepts of relational database querying are core to understanding how to do similar things using programming languages such as R or Python. This lesson will teach you what relational databases are, how you can load data into them and how you can query databases to extract just the information that you need.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Christina Koch; Donal Heidenblad; Katy Felkner; Rémi Rampin; Timothée Poisot
Date Added:: 03/20/2017

Data Management with SQL for Social Scientists

Unrestricted Use

CC BY

Data Management with SQL for Social Scientists

Rating

This is an alpha lesson to teach Data Management with SQL for Social Scientists, We welcome and criticism, or error; and will take your feedback into account to improve both the presentation and the content. Databases are useful for both storing and using data effectively. Using a relational database serves several purposes. It keeps your data separate from your analysis. This means there’s no risk of accidentally changing data when you analyze it. If we get new data we can rerun a query to find all the data that meets certain criteria. It’s fast, even for large amounts of data. It improves quality control of data entry (type constraints and use of forms in Access, Filemaker, etc.) The concepts of relational database querying are core to understanding how to do similar things using programming languages such as R or Python. This lesson will teach you what relational databases are, how you can load data into them and how you can query databases to extract just the information that you need.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data; Social Science
Material Type:: Module
Provider:: The Carpentries
Author:: Peter Smyth
Date Added:: 08/07/2020

Data Organization in Spreadsheets for Ecologists

Unrestricted Use

CC BY

Data Organization in Spreadsheets for Ecologists

Rating

Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. We organize data in spreadsheets in the ways that we as humans want to work with the data, but computers require that data be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: Good data entry practices - formatting data tables in spreadsheets How to avoid common formatting mistakes Approaches for handling dates in spreadsheets Basic quality control and data manipulation in spreadsheets Exporting data from spreadsheets In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Subject:: Applied Science; Computer Science; Information Science; Mathematics; Measurement and Data
Material Type:: Module
Provider:: The Carpentries
Author:: Christie Bahlai; Peter R. Hoyt; Tracy Teal
Date Added:: 03/20/2017

Data Organization in Spreadsheets for Social Scientists

Unrestricted Use

CC BY

Data Organization in Spreadsheets for Social Scientists

Rating

Lesson on spreadsheets for social scientists. Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start. Typically we organize data in spreadsheets in ways that we as humans want to work with the data. However computers require data to be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too! In this lesson, you will learn: Good data entry practices - formatting data tables in spreadsheets How to avoid common formatting mistakes Approaches for handling dates in spreadsheets Basic quality control and data manipulation in spreadsheets Exporting data from spreadsheets In this lesson, however, you will not learn about data analysis with spreadsheets. Much of your time as a researcher will be spent in the initial ‘data wrangling’ stage, where you need to organize the data to perform a proper analysis later. It’s not the most fun, but it is necessary. In this lesson you will learn how to think about data organization and some practices for more effective data wrangling. With this approach you can better format current data and plan new data collection so less data wrangling is needed.

Subject:: Applied Science; Information Science; Mathematics; Measurement and Data; Social Science
Material Type:: Module
Provider:: The Carpentries
Author:: David Mawdsley; Erin Becker; François Michonneau; Karen Word; Lachlan Deer; Peter Smyth
Date Added:: 08/07/2020