In the age of vast online resources, Wikipedia stands as one of the most comprehensive repositories of human knowledge. To streamline the extraction of this knowledge, Python’s wikipedia-api
package provides an efficient method for interacting with Wikipedia’s rich content programmatically. This paper outlines the steps to install, use, and code examples for extracting summaries, page content, references, and categories using the wikipedia
library in Python.
Installation
To begin working with Wikipedia’s data programmatically in Python, the first step is to install the wikipedia-api
package. You can install it by running the following command:
pip install wikipedia-api
This command will download and install the Wikipedia library along with its dependencies.
Code Example
After installing the wikipedia-api
, the following Python code demonstrates various ways to retrieve information from Wikipedia:
# Import the Wikipedia library
import wikipedia
# Search for a topic and retrieve suggestions
print("Search Results for 'Programming':")
search_results = wikipedia.search("Programming")
print(search_results)
# Extract summary of a Wikipedia article (e.g., Linux)
print("\nSummary of Linux:")
summary = wikipedia.summary("Linux")
print(summary)
# Extract a limited number of sentences from a Wikipedia article (e.g., Android)
print("\nFirst 2 sentences of the Android article:")
short_summary = wikipedia.summary("Android", sentences=2)
print(short_summary)
# Retrieve full Wikipedia page data (e.g., Android operating system)
print("\nComplete Wikipedia page for 'Android operating system':")
page = wikipedia.page("Android (operating system)")
print(page)
# Retrieve metadata (title, references, categories) from the Wikipedia page
print("\nTitle, References, and Categories of 'Python (programming language)':")
python_page = wikipedia.page("Python (programming language)")
# Extract the plain text content of the page
print("\nContent of 'Python (programming language)':")
print(python_page.content)
# Retrieve references from the page
print("\nReferences of 'Python (programming language)':")
print(python_page.references)
# Retrieve categories from the page
print("\nCategories of 'Python (programming language)':")
print(python_page.categories)
Breakdown of the Code
- Search for Topics
Thewikipedia.search()
function takes a search query and returns a list of relevant article titles.
search_results = wikipedia.search("Programming")
print(search_results)
- Extracting Summaries
Thewikipedia.summary()
function retrieves a summary of an article. You can also limit the summary to a specific number of sentences.
summary = wikipedia.summary("Linux")
short_summary = wikipedia.summary("Android", sentences=2)
print(summary)
- Retrieving Full Wikipedia Page Data
Thewikipedia.page()
function fetches the full page data, including metadata like the title, references, and categories.
page = wikipedia.page("Android (operating system)")
print(page.content)
- Extracting Metadata (Title, References, Categories)
Thewikipedia.page().content
provides the plain text content, whilewikipedia.page().references
andwikipedia.page().categories
retrieve the references and categories for the article.
python_page = wikipedia.page("Python (programming language)")
print(python_page.references)
print(python_page.categories)
Conclusion
Using Python and the wikipedia-api
library, developers can easily extract information, summaries, metadata, and categories from Wikipedia articles. This capability opens up numerous possibilities for building knowledge-based applications or integrating Wikipedia content into other platforms.
References
- Wikipedia API Documentation. (n.d.). https://wikipedia-api.readthedocs.io/
- PyPi. (n.d.). wikipedia-api. https://pypi.org/project/wikipedia-api/
- Wikipedia. (n.d.). “Wikipedia, The Free Encyclopedia”. Wikipedia. https://www.wikipedia.org/
+ There are no comments
Add yours