A Graphical Analysis of Women's Tops Sold on Goodwill's Website
by J. Peter
After 10ish years of second-hand shopping, I've started to ask myself a lot of questions about the clothes I've been buying, like, "Did someone die in this?" or, "Have thrift stores always been this pricy?" (the answer to the former being, "yeah, probably"). In the absense of any conclusive answers, I tried to get the data myself.
I set up a script that collected information on listings for more than four million women's shirts for sale through Goodwill's website, going back to mid-2014. The information is deeply flawed—a Goodwill online auction is very different from a Goodwill store—but we can get an idea of how thrift store offerings have changed through the years. There's more info on data collection method below.
1. Are Used Clothes Getting More Expensive?
In short, yes. Or rather, maybe. But if I keep making the "this data is imperfect" caveat, this is going to be a long piece, so just assume it's there from now on, and I'll address it more in a later section.
I started by finding the change in the median price of a woman's shirt. But overall median price increase only tells us so much. Have prices doubled because everything is getting more expensive? Or are Goodwills just selling more high-price items online? Or fewer inexpensive items?
In order to better understand how much the "average" item has increased, I found just t-shirts from a few common brands: American Eagle, Gap, Old Navy, Columbia, Champion, Under Armour, and Nike.
From July 2014 to December 2019, the median item price increased from $6 to $11.99. That's almost exactly a 100% increase.
Drilling down to a few mid-price retailers, we can observe an increase in median price of t-shirts from $3.49 to $7.99 (an 88% increase) from 2014 to 2019.
In the interest of due-diligence, I also confirmed that adjusting for inflation barely changes these results at all.
So that suggests that everything is getting more expensive, but it doesn't address our other questions: What's changed about the price distribution of items sold at Goodwill over time?
It looks like the percentage of under $5 items has decreased dramatically after an initial surge just prior to 2016, though the percengage of more expensive items has stayed more or less steady.
(I'll note here that the weird surge in expensive items post-October 2019 is more likely due to some data collection issues than an actual change in price, which is discussed again here.)
For this, I split the items into three price ranges: under $5, over $10 and everything else. These ranges I picked pretty arbitrarily based purely what I consider cheap and expensive (because distribution on this is whack, as seen in the chart below).
2. What Can Used Clothes Suggest About Fashion Trends?
Thrift stores are a funny thing when it comes to fashion. It's where your mom dumped your old Champion sweats when you got too cool for them, and where kids cooler than you pick up their vintage Champion sweats now that they're cool again.
So while the meaning of a surge in a certain type of item at Goodwill might be ambiguous, it would be a funny thing to have a huge dataset of clothing item descriptions and not use it to try to find out something about fashion trends, so here it is.
Ah, stream charts. The chart you don't so much read as experience. In this little number, the width (girth?) of each band represents which percentage of tops included a given keyword in their description.
I identified these trends by using tf-idf (term frequency–inverse document frequency) at both the monthly and yearly level to find which words disproportionately appeared in a certain period, then selected those that best represent trends. I use the same technique below to look into how people describe brands.
3. What Can Thrift Stores Tell Us About Clothing Brands?
As in life, so it is online: For every untouched Givenchy original, there's a thousand worn out Mossimo sweatshirts. The breakdown of brands over time is disappointing for any hypebeast. That said, the clothes may be absolute fire but we're talking clout here.
We get a slightly more nuanced view when we look at how these rankings have changed year-to-year.
If we want to talk price, we might also look at our most and least expensive brands. For this, I used our old arbitrary cut-offs of under $5, over $10, and everything in between to represent, low cost, high cost, and mid cost items, respectively. I mean, here we don't see all that much in the way of surprises. There's still a high number of items from various in-house brands from Florida department store Bealls (namely Coral Bay, Reel Legends, and Dept 222)—the significance of which I'll leave to people who actually know things about business.
We can also check out how brand popularity and price have changed over time.
But beyond just price figures, these data gives us a unique gift: Garment descriptions. You can keep your flowery Mr. Peterman descriptions of safari detailing ready to conquer both the boardroom and the night markets of Morocco. If I want to know how a brand is perceived, give me the ad hoc descriptions of Goodwill employees not obliged to spread the gospel of any brand bible.
For this, I filtered out one- and two-letter words, things that were obvious mispellings, and keywords that were only associated with one particular brand. (I'll also take the opportunity to let the reader know that I used regex to determine brand name based on each top's description and there were at least a few mistakes.)
4. How Does Goodwill Clothing Change By Location?
Any savvy thrift shopper knows that all thrift stores aren't created equal. My personal approach was always to go to the part of town where the rich people live. But does that strategy extend to the state level? To distribution centres and online sales rather than individual stores? Probably not, but let's pretend. Let's start by exploring how various values change over time by state.
FYI, the Goodwill location assiciated with the most sales is "Goodwill Industries of North Central PA, Inc.," which sold 2,643,509 items in our dataset. That's almost half of all sales, and more than 35 times higher than the next highest selling store's sales. If we ignore that so we can get an actually decent looking graph, we get the following:
Also, for the purposes of this chart's labels: GW = Goodwill, GWI = Goodwill Industries.
And just for good measure, let's do that thing the Pudding does where we make a map and show the brand most frequently sold there. Just for good measure, we'll throw in a colour scale showing how expensive each state is.
5. So, exactly how bad are these data and how did you get them?
Honestly, pretty bad. But, from the beginning, I've been operating on the principle that some data are better than none for such non-essential questions as "What's the deal with Goodwill?". So, let me unpack how I got the data and the issues I'm aware of.
Basically, after finding that data of old listings aren't available on the Goodwill website, I went rogue. I set up a script on my computer to crawl through sale item pages on the Goodwill website item id by item id and checked the page name and description to see if it was a women's top. This was a slow process, but I wasn't interested in overwhelming the Goodwill website, and, again, there is no real pressing need for statistics on Goodwill sales.
After that, I looked for large gaps in consecutive item ids and ran my script to attempt to capture items that I didn't get the first time. Even after all this, there are still gaps in my data, which you can see just by looking.
But hey, for all the imperfection, this method yielded a total of 4,100,355 records of ladies shirts. In my mind, this is enough to do some analysis, even if it's imperfect. For the record, I also deleted instances of items that seemed to be for multiple items sold in a bundle, so the original number of records was somewhat higher.
A bit more info on methods can be seen in my Python notebook (which is having some issues rendering online). The short version is that brand and state were determined by me based on the item name and seller name, respectively, and the other data was scraped.
This project was done without Goodwill's assistance or permission. Hopefully, though, they'll see this content as it's intended to be seen: As a very weird love letter to thrifting from a very weird person.
More of my nonsense.