Scraping 400+ books, building a bipartite graph, and using community detection to personalize book recommendations.


Most recommendation engines treat users and items as two separate lists and find correlations between them. But what if you modeled the relationship itself — the fact that a reader reviewed a book — as a network edge?

That’s the core question behind this project. And the answer turns out to be more powerful than a simple similarity matrix.


Starting With Real Data

Before building anything, we needed data. We scraped Goodreads across three loosely related genres — fantasy, science fiction, and thriller — collecting:

  • 400–500 books across the three categories
  • 125–150 reviewers per book (user IDs only, not review text)

The result: a dataset with four columns — genre, book_title, book_url, reviewer_id — and a web of connections hiding inside it.


What Is a Bipartite Network?

A standard (unipartite) network has one type of node — say, all users. A bipartite network has two distinct types of nodes with edges only running between types, never within.

In our case:

  • Type 1 nodes: Readers (reviewer IDs)
  • Type 2 nodes: Books (titles)
  • Edges: A reader reviewed a book
 
 
Reader A ──── Book X
Reader A ──── Book Y
Reader B ──── Book Y
Reader B ──── Book Z

This structure contains far more information than a flat ratings table. It encodes co-readership patterns, community membership, and recommendation pathways — all without a single star rating.


Visualizing the Network

The raw bipartite graph was extremely dense — nearly every reader was connected to multiple books, and highly-reviewed books acted as hubs connecting massive clusters of readers.

To make the network analyzable, we applied density thresholds: only keeping edges above a minimum co-occurrence threshold. The result was a sparser, more meaningful network where connections carry signal rather than noise.


Two Recommender Systems, One Dataset

The Bipartite Recommender

Logic: Find books that share many reviewers with books the target reader has already read. This leverages the full bipartite structure.

“You read Book X. Readers who also read Book X overwhelmingly read Book Y. Recommendation: Book Y.”

The key advantage: this recommendation path goes Reader → Shared Reviewers → New Book. It’s inherently two-hop reasoning, which captures richer co-reading patterns than a direct similarity score.

The Unipartite Recommender

Project the bipartite graph onto the book layer: two books are connected if they share reviewers, weighted by how many reviewers they share. This creates a book-to-book similarity network.

“Book Y is highly connected to Book X in the book graph. You read Book X. Recommendation: Book Y.”

Which Is Better?

The bipartite approach consistently surfaces more diverse recommendations — because it finds books that attract the same type of reader, not just books that overlap with one specific title. The unipartite approach is faster and easier to explain, but can over-index on popular books that appear in many co-reading pairs simply because they’re widely read.

For personalization, bipartite wins. For computational efficiency, unipartite is the pragmatic choice.


Community Detection: Finding Your Book Tribe

Here’s where the project gets genuinely interesting.

We ran community detection on the bipartite network to identify 3–4 natural reader–book clusters. These aren’t just genre groupings — they’re communities defined by who reads what alongside whom.

What emerged:

  • Community 1: High-fantasy readers with crossover into literary fiction
  • Community 2: Thriller/mystery readers with strong genre loyalty
  • Community 3: Science fiction readers who also reviewed fantasy crossovers
  • Community 4: A mixed-genre cluster concentrated around a handful of blockbuster titles

The communities didn’t map cleanly onto the genres we scraped. That’s the point — reading behavior creates its own taxonomy, one that marketing categories often miss.


Why This Matters for a Platform Like Goodreads

Traditional recommendation: “You rated this 5 stars, here are similar books.”

Community-aware recommendation: “You belong to a reading community that consistently discovers certain books before they go mainstream. Here’s what that community is reading right now.”

The first is reactive. The second is predictive.

Community detection also enables targeted features:

  • “Reading alike” badges for users in the same community
  • Community-curated shelves that surface niche picks, not just bestsellers
  • Cross-community bridge recommendations for readers who might enjoy adjacent communities

The Technical Takeaway

Bipartite networks are underused in recommendation systems. The extra structural information — the explicit separation of two node types — gives you community detection, two-hop recommendation paths, and projection flexibility, all from the same underlying data.

The Goodreads use case is a clean example. But the same framework applies anywhere you have users interacting with items: streaming platforms, e-commerce, research paper citation networks, job-candidate matching.

The structure of the relationship is data too. Use it.


 

→ View on GitHub: Goodreads-Bipartite-Recommender

Leave a Reply

Your email address will not be published. Required fields are marked *