Brian W. O'Shea - Getting started with research

join us! research
cv / bio


There is a standard set of skills and tools that are useful to members of my computational astrophysics research group. If you're interested in doing research with me, it's necessary to have some level of facility with the Linux command line interface, one of the simulation codes our group works with (Enzo, Enzo-E, Athena++, K-Athena, or Athena-PK), the yt data analysis and visualization tool, and the Git version control system. To that end, I've put together a list of things that you should do as we begin to work together. While this list is somewhat long, it is an accurate reflection of the software tools and practices that are necessary to be a productive member of my research group. You are not expected to be an expert in all of these skills and tools on Day 1 - you'll master it over time!

Onboarding with the research group

Prior to getting started with the following list of tasks, make sure that Brian adds you to the group's MatterMost server, to our group on the ICER supercomputer (including access to the buy-in compute nodes and shared disk partitions), and any ACCESS supercomputers that are needed for your work. These are the fundamental communication and computational platforms that our group uses. Also, create a GitHub account and share that with Brian!

The Big List

Before embarking on your research project, please complete the following list of tasks. I recommend doing them in the order below. Also, while you are working through this list, please don't be afraid to ask for help from me or from other members of the research group! Much of the software we use is under active development, and the documentation may be inaccurate or non-existent.

  1. Learn some of the Python programming language, which is the language that you will use to write most of your analysis, plotting, and visualization scripts. There are many ways to do this: the official Python tutorial (also see's "Getting started" page, which has many resources), the Software Carpentry lesson on programming with python, the Code Academy Python track, or by enrolling in CMSE 201, which is MSU's Python-based "Introduction to Computational Modeling and Data Analysis" course. I also strongly recommend installing the Anaconda Python distribution, since it has most of the packages our group uses (like matplotlib and scipy). (Note: this is taught in CMSE 201 and 202!)
  2. Learn to make plots with Python's matplotlib plotting library, using the latest stable version. In particular, work through the pyplot tutorial. (Note: this is taught in CMSE 201 and 202!)
  3. Get access to a computer that has a Unix-like terminal environment and the GNU C/C++/Fortran compilers. If you have a Mac or a PC that uses the Linux operating system, you already have this (though you may need to install compilers on your Mac - see this web page for instructions). If you have a Windows machine, you have two options. If you have Windows 10, the easiest is to install the Linux Subsystem for Windows, and then install Linux on it. Otherwise, you should consider setting up a Linux partition on your Windows machine to dual-boot that -- I recommend the Ubuntu distribution. See the download page for Ubuntu's desktop version to obtain Ubuntu, and for instructions for how to install it on a Windows machine. Note that you have many options if you're not comfortable with dual-booting - please talk to me about this! Also, if you plan to install Linux on your Windows machine, back up any important files first! There's always a remote possibility that something can go wrong, and you don't want to lose any data. Note that the MSU supercomputer (at ICER) is also a good option for this, and has the benefit of being a shared platform that most of the research group users for their work!
  4. Go to the Software Carpentry website and work through their lesson on the Unix shell. (Note: this skill is taught in CMSE 202)
  5. Create an account on GitHub, and experiment with creating a Git repository. Here is a good tutorial on how to use Git, and the GitHub resources page is also excellent. Note that you should sign up for the free GitHub student plan once you've created your account - see this page for more information. (Note: this skill is taught in CMSE 202!)
  6. Install the yt code. I recommend installing the development version of yt. You can get the install script from the yt web page, and installation should be relatively automatic (particularly if you already installed the Anaconda Python distribution).
  7. Using the yt Quickstart and the yt cookbook, do some simple analysis of a previously-run Enzo cosmology simulation. In particular, do slices and projections of density and temperature, phase plots of temperature and density, halo-finding, radial profiles of temperature and density for the most massive halo in the volume, and volume rendering of the density and temperature fields. To download a pre-existing Enzo simulation, click here for several Enzo data outputs (warning: 2.4 GB) or click here for a single Enzo data output from the same simulation (80 MB). A much wider variety of data is available at the yt project's data page if you'd like to experiment with different types of simulation data, from both the Enzo code and others. Several very large datasets (of cosmological simulations, turbulence, and other scenarios) are available in the group's shared disk space on the ICER supercomputer.
  8. Obtain and compile the Enzo simulation code, using Enzo's makefile system (i.e., not using the yt install script) - it's important to figure out how to do it this way. See the Enzo documentation for instructions on how to do this. Note that if your project uses one of the other simulation tools that we use (e.g., Enzo-E, Athena++, K-Athena, Athena-PK) install that instead!
  9. Run an Enzo (or Enzo-E, Athena++, K-Athena, or Athena-PK) test problem. For Enzo, you can find the parameter files in the enzo distribution's 'run' directory. I recommend running the Sod Shock Tube (in run/Hydro/Hydro-1D/SodShockTube) and also the Kelvin-Helmholz AMR test (in run/Hydro/Hydro-2D/KelvinHelmholtzAMR). Note that both of those directories contain scripts that you can use to make plots of the simulation outputs with yt - you should be sure to do so. The other codes mentioned have similar directories; look around a little and you will find them!
  10. Run a simple Enzo cosmology simulation (or something similar in the code you're working with). For Enzo, you can find instructions on how to run a simple simulation here, and the sample inits and enzo parameter files given produce calculations that run in a short time (less than an hour). You can also use the cosmology simulations included with the Enzo source distribution, in the run/CosmologySimulation directory, or click here to download a set of inits and Enzo parameter files that I've generated for a small cosmology simulation that will run in roughly an hour on 4 cores of a modern desktop computer (note: simulation requires the file!).
  11. Use yt to analyze the simulation you have just run, doing the same analysis tasks that you did for the pre-existing Enzo calculation. Do you get results that look similar? (Note: if you are using the parameter files that I generated and are worried that you are getting weird results, click here for the final simulation output from a small cosmological calculation.)

Assuming you have completed all of the tasks listed above, congratulations! It's time to do some research!