Exploring Gene Expression in Yeast using Microarrays

© Copyright 2004 Departments of Mathematics & Biology,

Davidson College, Davidson, NC 28035

 

In this lab, we will use the free open-source software program MAGIC Tool to explore the yeast diauxic shift microarray data published by DeRisi et al. in Science (1997). This software allows us to measure the intensity of each spot on the microarray and also to compare any given spot to all the others. This allows us to find patterns of gene expression under the given experimental conditionas. Pat Brown, in whose lab the data were generated, has generously provided both raw and processed data for us to work with. We will only look at the processed data, but we will discuss how the data is processed. The files you need to do this lab are linked to as you go along. You will also need a copy of the DeRisi et al. 1997 reprint from Science Magazine.

Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale

 

Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown*

 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used to carry out a comprehensive investigation of the temporal program of gene expression accompanying the metabolic shift from fermentation to respiration. The expression profiles observed for genes with known metabolic functions pointed to features of the metabolic reprogramming that occur during the diauxic shift, and the expression patterns of many previously uncharacterized genes provided clues to their possible functions. The same DNA microarrays were also used to identify genes whose expression was affected by deletion of the transcriptional co-repressor TUP1 or overexpression of the transcriptional activator YAP1. These results demonstrate the feasibility and utility of this approach to genomewide exploration of gene expression patterns.

 

Summary: Several programs are now available for analyzing the large datasets arising from cDNA microarray experiments. Most programs are expensive commercial packages or require expensive third party software. Some are freely available to academic researchers, but are limited to one operating system. MicroArray Genome Imaging and Clustering Tool (MAGIC Tool) is an open source program that works on all major platforms, and takes users "from tiff to gif". Several unique features of MAGIC Tool are particularly useful for research and teaching.

 

PART 1 & 2 are OPTIONAL

 

We will go through Part 1 together and discuss data processing in Part 2, you are welcome to work with the files described in this section but it is NOT required for this exercise. Processed data files are provided for Exploring expression data in Part 3.

 

 

 

 

PART 1:       Creating the Gene List

 

STEP 1 – Microarray Layout

 

1)    The gene list (a.k.a. godlist) contains information about all the genes that have been printed on our microarrays. The first thing we need to do is to correlate the information in the gene list with the spots (actually genes) printed on the microarrays. We need to specify exactly how the genes are ordered by grid, row and column and this information is specified in the godlist. Click on the following link to download & open the file DeRisiGodList.xls using Excel. The file is in tab delimited text format (each column is separated by a tab). This "godlist" is associated with the DeRisi tiff files (one green and one red), describing where each gene is spotted on the microarray.

 

2)    The jpeg snapshot shown below is the file 1309_ch1_OD690_green.tif, it shows the results of scanning one of the microarrays in the Cy3 channel. A grid or sector is a distinctly visible region on the chip.

 

3)    Study the godlist opened in step #1 and the image file opened in step #2 to help you answer the following questions:

 

       How many spots are on each microarray?

       How many grids (sometimes called sectors) are on each microarray?

       How many rows and columns are in each sector?

      

 

 Answer

 There are 6400 spots on each microarray, which you can tell by looking at the number of rows in the file within Excel.

The spots are arranged in four grids (or sectors). This can be determined by looking at the numbers in the column labeled Sector, and seeing that they match the number of distinct regions visible in the image.

There are 40 rows and 40 columns in each sector, which can be determined by looking at the numbers in the columns labeled Sector Row and Sector Column, and seeing that they match the number of rows and columns within each distinct region in the image.

Note that the overall orientation of the microarray is difficult to determine because of the symmetry: there are 2 metarows and 2 metacolumns, and each grid has the same number of rows as columns. Thus, the image could be rotated 90, 180, or 270 degrees, and the image would still have the same basic layout (4 grids, each containing 40 rows and 40 columns). However, we were able to verify that the tiff files were oriented correctly by matching the images with the figures in the paper.

 

Most microarray manufacturers now set the number of rows to be different from the number of columns, and mark the array on top or bottom so that the orientation is clear. A mark or special set of spots can also be used to help orient the image of the array.

 

4)    To use the godlist in MAGIC Tool, the spots must be listed systematically, first by grid, and by rows and columns within each grid. Using the Excel sorting function, we will modify the godlist so that the genes are listed in order of spot number. To do this, select all the genes (records) in the file by pressing Ctrl and A, look for DATA on the Menu bar , then Sort, Sort by: "Spot" , ascending) Note that this results in the grids, rows and columns being ordered sequentially.

 

5)   Once the genes have been sorted in spot order, the MAGIC Tool orientation questions ( see above -- ie what is the correct order of the spots?) can only be answered in four ways; the other four ways are ruled out by the way the rows and columns are numbered for the consecutive spots. List the four ways that are feasible. Answer

 

The spot orientation must be one of the following:

1.            Horizontal Left to Right, Vertical Top to Bottom, Spot 2 horizontal of Spot 1

2.        Horizontal Left to Right, Vertical Bottom to Top, Spot 2 horizontal of Spot 1

3.        Horizontal Right to Left, Vertical Top to Bottom, Spot 2 horizontal of Spot 1

4.        Horizontal Right to Left, Vertical Bottom to Top, Spot 2 horizontal of Spot 1

 

It is not possible for Spot 2 to be vertical of Spot 1, because all spots in Row 1 are listed before the first spot in Row 2.

 

 

 

6)      To determine the grid order (which is grid 1, 2, 3 and 4), and whether the spots are numbered left to right or right to left horizontally, and whether the spots are numbered top to bottom or bottom to top vertically, we can use the godlist in conjunction with Figure 1 of the paper. For example, find YDL204 in Figure 1, and read its sector, row and column numbers in the excel file. Use other ORF names (i.e. names that begin with Y) in Figure 1 to determine which is grid 2, and which is grid 4. Check your understanding of the godlist, and how it relates to the microarray image, by looking at the following Jpeg graphic below of the array orientation.

 

 

STEP 2 - Creating the Gene List

1.   The gene list for MAGIC Tool will be created from the Excel file we started with, which you should still have open, with the data sorted by spot number. This systematic ordering of genes is the first criterion for a MAGIC Tool gene list.

2.   A second requirement for the MAGIC Tool gene list is that the ORF names must appear in the first column. Using Excel, modify the godlist to meet this second criterion. (Select Column A, then click on the home tab and press insert (on the right side in the Cells box) to insert a new Column. Select Column F "SeqName", grab side of the column (cursor turns into a hand) and drag the column to the column A position. Select column F again (s/b empty now) and click on delete (in the cells menu), to remove column F.

3.   The final requirement for the MAGIC Tool gene list is that there must not be any column headings. Modify the godlist to meet this third criterion. (Select Row 1 w/ header names, click again on the Cells menu Edit, delete.)     

4.   Save this modified godlist, now a MAGIC Tool gene list, as a tab-delimited text file, calling it derisi_genelist.txt. (Menu, File, Save As, format s/b Text (Tab delimited). Save file in a new folder, Desktop, My Documents, MicroArray_YourName or in a new folder on your memory stick. We will use this folder to store all data and processing files for the rest of this lab.

 

Jump to Part 3

 

 

  Part 2: Generating Expression Ratios

PART 2:        Creating the Project.

 

1.    Open MAGIC Tool by clicking the icon on your desktop.

2.    Create a project called derisi_lab. The program will create a folder called derisi_lab, and a file called derisi_lab.gprj in the derisi_lab folder. (See the MAGIC Tool project tutorial for a reminder of how projects are organized.)

3.   Without closing MAGIC Tool, go to your computer's folder system (Explorer on Windows) and copy the following files into the derisi_lab project folder you just created with MAGIC Tool. If you do not already have these files on your hard drive, download them by right-clicking on each link below and save to your project folder (i.e. derisi_lab).

                                           

                        derisi_genelist.txt (the gene list you created in the previous section,

                        or a provided version of the same)

                         

                        derisi_first5.exp (the expression file for the first 5 timepoints)

                        derisi_last2.exp ( the expression file for the last 2 timepoints)

                    

                         

                        four tiff files:

                                          1309_ch1_OD690_green.tif

                                          1309_ch2_OD690_red.tif

                                          1313_ch1_OD730_green.tif

                                          1313_ch2_OD730_red.tif

                         

                        two grid files:

                                          1309.grid

                                          1313.grid

                         

                        yeastgenes.info (the yeast gene annotation information file)

                         

4.   In the MAGIC Tool and select Update Project under the Project menu. This copies all the files from the top-level project folder into their appropriate subfolders.

 

5.   In MAGIC Tool, under the "Build Expression File" menu, load the Red and Green image files for OD 6.9 (i.e. 1309 Ch 1 & 2). Load the derisi_genelist.txt as the gene list. We then need to define the regions on the array to be measured through addressing or gridding. When you begin the addressing step, the easiest way to do this is to "Load Saved Grid", and use the 1309.grid grid file and then click on "Done". We will now create 4 expression files called my3_10, my5_10, my3_last2 and my5_last2 representing the first 10 data points and the last 2 data points, (from the diauxic shift), using either a 3 nm radius or a 5 nm radius for integration as described below. Think about how the change in the defined radius for intergrating the data will affect the results.

6.   During segmentation we define the area to be integrated by defining a circle radius around the microarray spots. We would like to create two different expression files to observe how integrating the data with different radii affects the outcome:

Under "Build Expression File", "Segmentation"

                        Using a fixed circle with a radius of 3 pixels, and total signal (without background subtraction) create a new expression file by clicking the button labeled "Create Expression File" at the bottom of the screen. Enter my3_10 for the filename and enter 10 in the "Enter Column Name" (the number of hours that have passed between OD 0.14 and OD 6.9). Show me how.

                        Using fixed circle with a radius of 5 pixels, and total signal (without background subtraction) create a new file named my5_10, once again labeling the column 10.

7.   Repeat steps 5 and 6 (above) for the OD 7.3 array data, with the same settings noted above and with the following differences: For the OD 7.3 array, use the 1313.grid file for the 'gridding' step. When creating the expression files for OD 7.3, choose "append to file", selecting my3_10 or my5_10 as the file for appending, respectively. Enter the file names my3_last2 or my5_last2 for the new files, respectively. In each file, label the current column 12 in the "Enter Column Name" Box (see example). The 7.3 data will now be appended to the 6.9 data.

8.   Now we will see how your data compares to the published DeRisi data. We need to load the data we have just processed with the original data from the DeRisi analysis by merging the files. This process simply combines all the data so that the magic tool can access it and we can compare different data sets.

                        Use the command Merge Expression Files to combine the two expression files that you have just created, my5_last2.exp and my3_last2.exp, calling the result my_last2.exp (override the default name by simply typing over it). Accept the default nicknames for the two files, which will be appended to the column names. The merge will take a few minutes; you will not be able to open any menus until it is done.

                        Use the command Merge Expression Files to combine the existing expression file derisi_last2.exp with the merged expression file you just created, calling the result all_last2.exp. Important: you must select derisi_last2.exp as File #1, because all genes in File #1 need to be in File #2 for the merge to work properly.

                        Log base 2 transform the expression file.

                        From the Explore window, perform two-column plots comparing your 3 pixel segmentation to the published DeRisi data for the OD 7.3 array (12 hours into experiment), and your 5 pixel segmentation to the published DeRisi data for that same array. Each plot will take a minute or so to appear, so be patient.

                                          Click on an outlier point in one of the plots, turning the point red and causing the ORF name to appear in the bottom right corner.

                                          Go to the other plot, and select the same gene from the drop-down menu in the bottom right corner. Is the ratio in the second plot closer to the published data, or even more different?

                                          Go back to Segmentation in the Build Expression File Menu, which should still contain the OD 7.3 array. Jump to the gene you identified in step (i), and try to explain why the ratio at this particular spot was difficult to determine. Experiment with different segmentation methods to see what you think the best answer is for the ratio at this spot. Answer

                                          As time permits, explore more outliers in the first set of plots, and/or repeat the analysis with the OD 7.3 array.

                                          Explain why it was important to log transform the data before looking for outliers in the two-column plots. Answer

 

 

9.   Use the command Merge Expression Files to combine the existing expression file derisi_first5.exp and the existing expression file derisi_last2.exp. Be sure to list the files in this order, and change the nicknames for both files to t. Call the merged file derisi.exp. After the merge is complete, examine derisi.exp using View / Edit Data, to be sure the column labels are in order.

10.  Add the gene information in yeastgenes.info to derisi.exp, forming derisi_i.exp. Use this merged and annotated file, which is the complete time course published by DeRisi, to answer the remaining questions.

PART 3 -- Exploring Expression Ratios

 

 

Download the processed dataset by clicking following link and save (don't choose open), the file derisi_i.exp.zip to a folder on your computer, (ie My Documents). Open the folder and RIGHT click the file which will bring up a menu, under the WINZIP menu choose "Extract to Here". (i.e. this will expand the file from its compressed format into the same folder).

 

Start the Magic Tool program by clicking the Magic icon on your desktop.

 

Under the project menu, click New Project, use a descriptive name and save the project file (the program will automatically add a .grpj suffix, you don't need to!!)

 

Under the project menu again, select add file. Select the file derisi_i copy.exp. You may have to navigate to the appropriate folder that contains the unzipped datafile.

 

Use the processed data set to answer the following questions by exploring the data.

 

Under the expression menu in the Magic program, select explore. The "answer links" for the first several questions demonstrate how the program can be used to extract information from the data set. Use similar approaches to answer all the subsequent questions.

 

NOTE: In the Plot Window (described below) you can click the small down arrow, (its beneath the FILE menu heading), to display information about selected genes (red in the graph) that are shown. You can click on a single gene or select many genes to get more information.

 

1. How many genes' expression change by at least a factor of 2 in the first two hours? (p. 680 in Science Article) Answer

2. How many genes' expression are greater than 2.0 or less than 0.5 in the time 0 microarray? How does this affect your interpretation of the answer to #1? Answer

3. How many genes' expression increases by a factor of at least 4 sometime during the time course? How many genes' expression diminishes by a factor of at least 4 sometime during the time course? (p. 680) Answer

4. Investigate the change in expression of ribosomal genes by forming a group of ribosomal genes, plotting the group, and highlighting the mitochrondrial genes in the plot. (p. 681) Answer Explain the differences you see in the expression levels of the two groups.

5. Another way to look at the data is to build a two column plot. Using the same data set as above (183 Genes), instead of choosing "Plot Selected group" choose "Two Column Plot". For the First column choose 0_t and for the second column choose 10_t. Magic will now create your plot. On the plotting Menu under the Search heading, choose Gene Cellular Component Contains. Type Mitochondria in the search field and hit enter. Explain the plot that you observe, ignore the diagonal blue line - it's just a trend line, pay attention to the location of the data points.

6. Do a new search to determine the number of Kinase genes that are upregulated between the zero time point and the 10 hour time point. Hint: What genes have an expression level less than 2 at time point zero and greater than 2 at the 10 hour time point AND have Kinase listed a s their molecular function? What types of activities are these genes involved in (don't list all activities, just any trend you notice)?

7. Which chromosome contains the highest number of genes that exhibited increased expression levels over time. Hint: Combine a search for increased expression levels (i.e. as in question 6) with a particular chromosome. Make sure that you select Group genes matching "ALL" selected criteria at the bottom of the window. What function do these genes perform? If the function is unknown how might you get some clues about their function?

8. Do a new search for "dehydrogenase" under molecular function. Which gene has the greatest increase in expression level? Explain the role of this gene in the diauxic shift based on your reading and intrepretation of the Derisi article.

9. Search for "decarboxylase" under molecular function. This time look for the gene with the greatest decrease in expression and identify the gene name. Why is this gene down regulated? On the main Magic window select "View Data" under the Expression menu. A data window will pop up. Under the edit menu select "find gene" and enter your gene name. Calculate the percent decrease in expression between time point 0 and time point 12. Calculate the number (include all yeast genes) that undergo a similar decrease in expression. That is, have a starting expression level greater than the decarboxylase gene and a final expression level (at 12 hours) less than the decarboxylase gene.