VIEW MY CODE

Top Selling Books

Project Proposal

The goal of this project was to collect data from a variety of sources, prepare the data, merge and store the datasets into a database, and create visualizations of the data.

Data

I obtained data from the following sources:

Goodreads-books (Kaggle) - This CSV file is a comprehensive list of all books listed in Goodreads. It contains 11,127 books with the following 12 variables for each book:

bookID

unique identification number

title

title of the book

authors

name of book authors

average_rating

average rating on Goodreads

isbn

ten digit unique identifier

isbn13

thirteen digit unique idenifier

language_code

primary language

num_pages

total number of pages

rating_count

total number of ratings received

text_reviews_count

total number of reviews written

publication_date

date of book publication

publisher

name of book publisher
List of best-selling books (Wikipedia) - The list of best-selling books consists of the following variables:

book

book title

authors

name of book authors

original language

language in which the book was originially written

first published

year in which the book was first published

approximate sales

how much the book made in millions of dollars

genre

the genre of the book
Open Library API - The Open Library API contains over 20 million book editions with information about books. I used the usbn13 number from the Goodreads CSV to obtain additional information abou tthe books. Some of the data extracted included title, genres, languages, publish_country, etc.

Data Preparation

I performed data transformations and cleaning techniques on each of these datasets, as seen in the following:

Goodreads CSV
List of best-selling books
Open Library API

SQL Database and Visualizations

Once each dataset was prepared, I merged and stored them in a SQLite database and created visualization of my findings, as seen in the following:

Code for SQL and Visuals

Conclusion

This project provide many learning opportunities. As a whole, it provided an in-depth explanation of the data wrangling process from start to finish. As a hands-on learner, I feel like it provided me with a solid understanding of data wrangling.