URL Structure

Visualizing a website’s URL structure with a treemap

source

url_structure

 url_structure (url_list, items_per_level=10, height=600, width=None,
                theme='none', domain='example.com', title='URL Structure')

Create a treemap for the first two URL path directories example.com/dir_1/dir_2/.

Type Default Details
url_list list Any list-like object with a bunch of URLs.
items_per_level int 10 The number of items to display for each level of the treemap. All other
items will be grouped under a special item called “Others”.
height int 600 The height of the chart in pixels.
width NoneType None The width of the chart in pixels.
theme str none Name of theme to use for the chart. Available themes:
ggplot2, seaborn, simple_white, plotly, plotly_white, plotly_dark,
presentation, xgridoff, ygridoff, gridon, none.
domain str example.com The main domain of the URL list. This will be displayed at the top
panel in the treemap to display values like a breadcrumb.
title str URL Structure
Returns plotly.graph_objects.Figure

Read a list of URLs from a text/CSV file

Code
import advertools as adv
import adviz
import pandas as pd
from pathlib import Path
import os
if os.getcwd().endswith('/nbs'):
    filepath = 'data/apple_url_list.csv'
else:
    filepath = 'nbs/data/apple_url_list.csv'

apple = pd.read_csv(filepath)
apple.head(10)
url
0 https://www.apple.com/ae/shop/accessories/all
1 https://www.apple.com/ae/shop/accessories/all/accessibility
2 https://www.apple.com/ae/shop/accessories/all/airtag
3 https://www.apple.com/ae/shop/accessories/all/beats
4 https://www.apple.com/ae/shop/accessories/all/beats-featured
5 https://www.apple.com/ae/shop/accessories/all/cases-protection
6 https://www.apple.com/ae/shop/accessories/all/creativity
7 https://www.apple.com/ae/shop/accessories/all/displays-mounts
8 https://www.apple.com/ae/shop/accessories/all/drones
9 https://www.apple.com/ae/shop/accessories/all/headphones-speakers

Visualize the URL structure with

url_structure(apple['url'])

Number of values per level

url_structure(
    url_list=apple['url'],
    items_per_level=5)

Number of values per level

url_structure(
    url_list=apple['url'],
    items_per_level=25)

Pick a theme

url_structure(
    url_list=apple['url'],
    items_per_level=25,
    theme='plotly_dark')

Pick a theme

url_structure(
    url_list=apple['url'],
    items_per_level=15,
    theme='seaborn')

Set domain name and chart title

Code
url_structure(
    url_list=apple['url'],
    items_per_level=15,
    theme='ggplot2',
    domain='apple.com',
    title='URL Structure: <b>apple.com</b><br>Raw data: <a href="data/apple_url_list.csv">Apple.com URLs</a>')