Demonstration

Author

Monika K, Bhuvi Kalarwal, Tanushree Deshmukh

Published

February 25, 2025

import pandas as pd
%pip install lxml
Requirement already satisfied: lxml in /home/codespace/.python/current/lib/python3.12/site-packages (5.3.1)

[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.

The above installation may not be required for everyone, but a few versions do need them

Function for reading HTML tables into pandas DataFrames

Basic usage

Reading HTML tables

Let’s consider a Wikipedia page with multiple tables to understand the topic

url = 'https://en.wikipedia.org/wiki/List_of_Indian_states_and_union_territories_by_literacy_rate'
tables = pd.read_html(url)
display(tables)
[    States and union territories of India ordered by
 0  Area Population GDP (per capita) Abbreviations...
 1                                                vte,
                          State or UT Census 2011[2]                \
                          State or UT        Average   Male Female   
 0                              India          74.04  82.14  65.46   
 1   A&N islands[UT][citation needed]          86.63  90.27  82.43   
 2                     Andhra Pradesh          67.02  74.88  59.15   
 3                  Arunachal Pradesh          65.38  72.55  57.70   
 4                              Assam          72.19  77.85  66.27   
 5                              Bihar          61.80  71.20  51.50   
 6                       Chhattisgarh          70.28  80.27  60.24   
 7                     Chandigarh[UT]          86.05  89.99  81.19   
 8         Dadra and Nagar Haveli[UT]          76.34  85.17  64.32   
 9                    Daman & Diu[UT]          87.10  91.54  79.55   
 10                         Delhi[UT]          86.21  90.94  80.76   
 11                               Goa          88.70  92.65  84.66   
 12                           Gujarat          78.03  85.75  69.68   
 13                           Haryana          75.55  84.06  65.94   
 14                  Himachal Pradesh          82.80  89.53  75.93   
 15                 Jammu and Kashmir          67.16  76.75  56.43   
 16                         Jharkhand          66.41  76.84  55.42   
 17                         Karnataka          75.36  82.47  68.08   
 18                            Kerala          94.00  96.11  91.07   
 19                   Lakshadweep[UT]          91.85  95.56  87.95   
 20                    Madhya Pradesh          69.32  78.73  59.24   
 21                       Maharashtra          82.34  88.38  75.87   
 22                           Manipur          76.94  83.58  70.26   
 23                          Meghalya          74.43  75.95  72.89   
 24                           Mizoram          91.33  93.35  89.27   
 25                          Nagaland          79.55  82.75  76.11   
 26                            Odisha          72.87  81.59  64.01   
 27                    Puducherry[UT]          85.85  91.26  80.67   
 28                            Punjab          75.84  80.44  70.73   
 29                         Rajasthan          66.11  79.19  52.12   
 30                            Sikkim          81.42  86.55  75.61   
 31                        Tamil Nadu          80.09  86.77  73.44   
 32                         Telangana              -      -      -   
 33                           Tripura          87.22  92.53  82.73   
 34                       Uttarakhand          78.82  87.40  70.01   
 35                     Uttar Pradesh          67.68  77.28  57.18   
 36                       West Bengal          76.36  81.69  70.54   
 
    NSO survey (2017)[3]                
                 Average   Male Female  
 0                  77.7   84.7   70.3  
 1                 86.27  90.11  81.84  
 2                  66.9     80   59.5  
 3                 66.95   73.4  59.50  
 4                  85.9   90.1   81.2  
 5                  70.9   79.7   60.5  
 6                  77.3   85.4   68.8  
 7                 86.43  90.54  81.38  
 8                 77.65  86.46  77.65  
 9                 87.07  91.48  87.07  
 10                88.70  82.40  93.70  
 11                 87.4  92.81  81.84  
 12                 82.4   89.5   74.8  
 13                 80.4   88.0   71.3  
 14                 86.6   92.9   80.5  
 15                 77.3   85.7   68.0  
 16                 74.3   83.0   64.7  
 17                 77.2   83.4   70.5  
 18                 96.2   97.4   95.2  
 19                92.28  96.11  88.25  
 20                 73.7   81.2   65.5  
 21                 84.8   90.7   78.4  
 22                79.85  86.49  73.17  
 23                75.48  77.17  73.78  
 24                91.58  93.72  89.40  
 25                80.11  83.29  80.11  
 26                 77.3   84.0   70.3  
 27                86.55  92.12  86.55  
 28                 83.7   88.5   78.5  
 29                 69.7   80.8   57.6  
 30                 82.2  87.29  76.43  
 31                 82.9   87.9   77.9  
 32                    -      -      -  
 33                87.75  92.18  83.15  
 34                 87.6   94.3   80.7  
 35                 73.0   81.8   63.4  
 36                80.50  84.80  76.10  ,
                   State/UT   1951   1961   1971   1981   1991   2001   2011
 0              A&N islands  30.30  40.07  51.15  63.19  73.02  81.30  86.63
 1           Andhra Pradesh      -  21.19  24.57  35.66  44.08  60.47  67.02
 2        Arunachal Pradesh      -   7.13  11.29  25.55  41.59  54.34  65.38
 3                    Assam  18.53  32.95  33.94      -  52.89  63.25  72.19
 4                    Bihar  13.49  21.95  23.17  32.32  37.49  47.00  61.80
 5               Chandigarh      -      -  70.43  74.80  77.81  81.94  86.05
 6             Chhattisgarh   9.41  18.14  24.08  32.63  42.91  64.66  70.28
 7   Dadra and Nagar Haveli      -      -  18.13  32.90  40.71  57.63  76.24
 8            Daman and Diu      -      -      -      -  71.20  78.18  87.10
 9                    Delhi      -  61.95  65.08  71.94  75.29  81.67  86.21
 10                     Goa  23.48  35.41  51.96  65.71  75.51  82.01  88.70
 11                 Gujarat  21.82  31.47  36.95  44.92  61.29  69.14  78.03
 12                 Haryana      -      -  25.71  37.13  55.85  67.91  75.55
 13        Himachal Pradesh      -      -      -      -  63.86  76.48  82.80
 14       Jammu and Kashmir      -  12.95  21.71  30.64      -  55.52  67.16
 15               Jharkhand  12.93  21.14  23.87  35.03  41.39  53.56  66.41
 16               Karnataka      -  29.80  36.83  46.21  56.04  66.06  75.36
 17                  Kerala  47.18  55.08  69.75  78.85  89.81  90.86  94.00
 18             Lakshadweep  15.23  27.15  51.76  68.42  81.78  86.66  91.85
 19          Madhya Pradesh  13.16  21.41  27.27  38.63  44.67  63.74  69.32
 20             Maharashtra  27.91  35.08  45.77  57.24  64.87  76.84  82.34
 21                 Manipur  12.57  36.04  38.47  49.66  59.89  70.50  76.94
 22                Meghalya      -  26.92  29.49  42.05  49.10  62.56  74.43
 23                 Mizoram  31.14  44.01  53.80  59.88  82.26  88.80  91.33
 24                Nagaland  10.52  21.95  33.78  50.28  61.65  66.59  79.55
 25                  Odisha  15.80  21.66  26.18  33.62  49.09  63.08  72.87
 26              Puducherry      -  43.65  53.38  65.14  74.74  81.24  85.85
 27                  Punjab      -      -  34.12  43.37  58.51  69.65  75.84
 28               Rajasthan    8.5  18.12  22.57  30.11  38.55  60.41  66.11
 29                  Sikkim      -      -  17.74  34.05  56.94  68.81  81.42
 30              Tamil Nadu      -  36.39  45.40  54.39  62.66  73.45  80.33
 31                 Tripura      -  20.24  30.98  50.10  60.44  73.19  87.22
 32           Uttar Pradesh  12.02  20.87  23.99  32.65  40.71  56.27  67.68
 33             Uttarakhand  18.93  18.05  33.26  46.06  57.75  71.62  78.82
 34             West Bengal  24.61  34.46  38.86  48.65  57.70  68.64  76.26
 35                   India  18.33  28.30  34.45  43.57  52.21  64.84  74.04,
   Social Group Rural        Urban        Rural + Urban       
   Social Group  Male Female  Male Female          Male Female
 0           ST  75.6   58.8  91.3   79.6          77.5   61.3
 1           SC  78.0   60.9  88.4   75.3          80.3   63.9
 2          OBC  81.7   64.2  91.1   80.5          84.4   68.9
 3       OTHERS  87.6   74.5  95.0   88.6          90.8   80.6
 4          ALL  81.5   65.0  92.2   82.8          84.7   70.3,
   Social Group Rural        Urban        Rural + Urban       
   Social Group  Male Female  Male Female          Male Female
 0        Hindu  81.8   64.5  93.4   83.8          85.1   70.0
 1       Muslim  77.4   64.8  85.8   75.6          80.6   68.8
 2    Christian  84.4   77.0  95.5   91.4          88.2   82.2
 3         Sikh  92.7   96.4  94.2   95.3          88.9   90.1
 4          ALL  81.5   65.0  92.2   82.8          84.7   70.0]
print(type(tables))
<class 'list'>

Converting the type of tables into DataFrame as it makes performing a lot of operations more convenient

print(type(tables[1]))
<class 'pandas.core.frame.DataFrame'>
display(tables[1].head())
State or UT Census 2011[2] NSO survey (2017)[3]
State or UT Average Male Female Average Male Female
0 India 74.04 82.14 65.46 77.7 84.7 70.3
1 A&N islands[UT][citation needed] 86.63 90.27 82.43 86.27 90.11 81.84
2 Andhra Pradesh 67.02 74.88 59.15 66.9 80 59.5
3 Arunachal Pradesh 65.38 72.55 57.70 66.95 73.4 59.50
4 Assam 72.19 77.85 66.27 85.9 90.1 81.2

Demonstrating some features of pandas.read_html()

Using match:

Filters tables by matching the given text. It’s useful for pages with multiple tables.

tables = pd.read_html(url, match='Social Group')
display(tables[0].head())
Social Group Rural Urban Rural + Urban
Social Group Male Female Male Female Male Female
0 ST 75.6 58.8 91.3 79.6 77.5 61.3
1 SC 78.0 60.9 88.4 75.3 80.3 63.9
2 OBC 81.7 64.2 91.1 80.5 84.4 68.9
3 OTHERS 87.6 74.5 95.0 88.6 90.8 80.6
4 ALL 81.5 65.0 92.2 82.8 84.7 70.3
tables = pd.read_html(url, match='Social Group')
display(tables[1].head())
Social Group Rural Urban Rural + Urban
Social Group Male Female Male Female Male Female
0 Hindu 81.8 64.5 93.4 83.8 85.1 70.0
1 Muslim 77.4 64.8 85.8 75.6 80.6 68.8
2 Christian 84.4 77.0 95.5 91.4 88.2 82.2
3 Sikh 92.7 96.4 94.2 95.3 88.9 90.1
4 ALL 81.5 65.0 92.2 82.8 84.7 70.0

Using index_col and header:

Sets index and headers for better DataFrame structure.

df = pd.read_html(url, index_col=0)[1]
display(df.head())
State or UT Census 2011[2] NSO survey (2017)[3]
State or UT Average Male Female Average Male Female
India 74.04 82.14 65.46 77.7 84.7 70.3
A&N islands[UT][citation needed] 86.63 90.27 82.43 86.27 90.11 81.84
Andhra Pradesh 67.02 74.88 59.15 66.9 80 59.5
Arunachal Pradesh 65.38 72.55 57.70 66.95 73.4 59.50
Assam 72.19 77.85 66.27 85.9 90.1 81.2

Now “State or UT” is the index instead of a regular column.

df = pd.read_html(url, header=1)[1]
display(df.head())
State or UT Average Male Female Average.1 Male.1 Female.1
0 India 74.04 82.14 65.46 77.7 84.7 70.3
1 A&N islands[UT][citation needed] 86.63 90.27 82.43 86.27 90.11 81.84
2 Andhra Pradesh 67.02 74.88 59.15 66.9 80 59.5
3 Arunachal Pradesh 65.38 72.55 57.70 66.95 73.4 59.50
4 Assam 72.19 77.85 66.27 85.9 90.1 81.2