Visualizing the Tate's Collection

Hacking the Humanities Midterm Project

Introduction

Using the Tate’s public data on the artists and artwork in their collection, I analyzed works acquired between 1900 and 1970 to better understand trends in acquisitions and the gender and origin of artists. My primary goal was to create effective data visualization to demonstrate my findings. I used OpenRefine and Google Sheets to clean and analyze the data, Flourish to create data visualizations, and WordPress to share my project with the public.

Here are my final visualizations, which condense this sprawling dataset into a few key points.

Sources

Although our professor provided us with a cleaned and reduced version of this dataset, I wanted to start from scratch to customize which time periods I was analyzing. I downloaded both the full artist and artwork datasets and loaded them into OpenRefine. Then, I joined the datasets into one project, following this tutorial, so that each artwork would be associated with the artist as well as information about the artist from the artist dataset. Finally, I cropped the combined dataset to only include works acquired between 1900 and 1970. This provided me with a large window to analyze but also kept the size of the data reasonable enough so that I could load it into Flourish.

Processes

Now that I had all the data together, I needed to break it into smaller parts in order to see specific trends. I chose three areas of interest: artists’ gender, artists’ origin, and number of acquisitions. I switched to Google Sheets for the next portion of data analysis because I have more familiarity than with OpenRefine.

First, I wanted to isolate the percentage of artists of each gender whose work was acquired each year. I inserted a pivot table to do this and made my rows acquisitionYear, my columns artistSex, and my values id summed by COUNTA. This created a two-column dataset that I could feed into Flourish to create my visualization.

Second, I repeated the process above for artistOrigin. Before creating the pivot table, I isolated the country in each artistOrigin cell so that the data was easier to work with. This is the formula I used:

=ARRAYFORMULA(IF(G2:G="", , TRIM(IFERROR(RIGHT(G2:G, LEN(G2:G) - FIND("@", SUBSTITUTE(G2:G, ",", "@", LEN(G2:G) - LEN(SUBSTITUTE(G2:G, ",", ""))))), G2:G))))

Finally, I repeated the process with acquisitionYear and id summed by COUNTA to look at how many works they acquired each year.

Presentation

There are two components to my presentation: the Flourish visualizations and this website. I created three separate visualizations in Flourish that I customized to show what I wanted the viewer to take away. First, I chose a column chart with stacked percentages to clearly show the gender breakdown of works acquired. This allows the viewer to see how the percentages change over time because I made the x-axis the acquisition year. Second, I chose a circle hierarchy chart to show the proportion of artists from each country. I initially wanted to create a map projection to do this more geographically, but it less attractive and less clear. Third, I chose a simple line chart to show the change in acquisition amounts over the years. For all three, I added descriptive titles, subtitles, and axis labels.

I chose to share my findings on a website so that it is easily accessible to the public. My visualizations are embedded and interactive to encourage engagement, and I have detailed all of my processes so that someone could replicate my work if they wanted. By using my own subdomain and installing WordPress, I have control over the content and its appearance. I kept my site simple so that the visualizations would stand out.

Significance

My results reveal many interesting details about the Tate’s collection. By breaking down the origin and gender of artists, we can see that their acquisitions were overwhelmingly male and European between 1900 and 1970. The number of works by artists from the United Kingdom is especially striking, though it makes sense considering the location of the Tate. It would be interesting to look at historical events that align with the acquisition trends to see if there is a reason why they acquired more or less in a given year (by perhaps looking at their budget). In the future, it would be good to add more recent data to see if they have balanced out artist gender and origin in recent years. While these visualizations give us a good idea of their collection 100 years ago, it fails to show how they may have fixed some of these gaps.