Airline Safety
Top Selling Books
Project Proposal
The goal of this project was to collect data from a variety of sources, prepare the data, merge and store the datasets into a database, and create visualizations of the data.
Data
I obtained data from the following sources:
- Goodreads-books (Kaggle) - This CSV file is a comprehensive list of all books listed in Goodreads. It contains 11,127 books with the following 12 variables for each book:
- bookID
-
unique identification number
- title
-
title of the book
- authors
-
name of book authors
- average_rating
-
average rating on Goodreads
- isbn
-
ten digit unique identifier
- isbn13
-
thirteen digit unique idenifier
- language_code
-
primary language
- num_pages
-
total number of pages
- rating_count
-
total number of ratings received
- text_reviews_count
-
total number of reviews written
- publication_date
-
date of book publication
- publisher
-
name of book publisher
- List of best-selling books (Wikipedia) - The list of best-selling books consists of the following variables:
- book
-
book title
- authors
-
name of book authors
- original language
-
language in which the book was originially written
- first published
-
year in which the book was first published
- approximate sales
-
how much the book made in millions of dollars
- genre
-
the genre of the book
- Open Library API - The Open Library API contains over 20 million book editions with information about books. I used the usbn13 number from the Goodreads CSV to obtain additional information abou tthe books. Some of the data extracted included title, genres, languages, publish_country, etc.
Data Preparation
I performed data transformations and cleaning techniques on each of these datasets, as seen in the following:
Goodreads CSVList of best-selling books
Open Library API
SQL Database and Visualizations
Once each dataset was prepared, I merged and stored them in a SQLite database and created visualization of my findings, as seen in the following:
Code for SQL and VisualsConclusion
This project provide many learning opportunities. As a whole, it provided an in-depth explanation of the data wrangling process from start to finish. As a hands-on learner, I feel like it provided me with a solid understanding of data wrangling.