Python offers a versatile set of libraries to interact with various online databases and resources. One such resource is Wikipedia, and using the wikipedia-api
package in Python, developers can fetch articles, summaries, metadata, and more programmatically. This paper focuses on using this package in a Windows environment, providing step-by-step instructions for installation and usage, along with Python code examples.
Setting Up Python in a Windows Environment
- Install Python
If you don’t already have Python installed, download it from the official website. During installation, ensure that you check the box “Add Python to PATH” for easier command line usage. - Install pip (if not already installed)
pip is the package manager for Python. It usually comes pre-installed with Python, but in case it is not, you can install it manually by following the instructions here. - Install
wikipedia-api
package
Once Python and pip are set up, open the Command Prompt (Windows + R, typecmd
and hit Enter) and type the following command to install the Wikipedia library:
pip install wikipedia-api
This will download and install the wikipedia-api
package along with its dependencies.
Code Example
Once the package is installed, you can use the following Python code in your Windows environment to retrieve data from Wikipedia:
- Create a Python File
Open Notepad (or any Python IDE such as PyCharm or VS Code), and create a new Python file namedwikipedia_example.py
. - Python Code for Wikipedia Extraction
# Import the Wikipedia library
import wikipedia
# 1. Search for a topic and retrieve suggestions
print("Search Results for 'Programming':")
search_results = wikipedia.search("Programming")
print(search_results)
# 2. Extract summary of a Wikipedia article (e.g., Linux)
print("\nSummary of Linux:")
summary = wikipedia.summary("Linux")
print(summary)
# 3. Extract a limited number of sentences from a Wikipedia article (e.g., Android)
print("\nFirst 2 sentences of the Android article:")
short_summary = wikipedia.summary("Android", sentences=2)
print(short_summary)
# 4. Retrieve full Wikipedia page data (e.g., Android operating system)
print("\nComplete Wikipedia page for 'Android operating system':")
page = wikipedia.page("Android (operating system)")
print(page)
# 5. Retrieve metadata (title, references, categories) from the Wikipedia page
print("\nTitle, References, and Categories of 'Python (programming language)':")
python_page = wikipedia.page("Python (programming language)")
# 6. Extract the plain text content of the page
print("\nContent of 'Python (programming language)':")
print(python_page.content)
# 7. Retrieve references from the page
print("\nReferences of 'Python (programming language)':")
print(python_page.references)
# 8. Retrieve categories from the page
print("\nCategories of 'Python (programming language)':")
print(python_page.categories)
- Save and Run the Script
Save the file and run it from the Command Prompt or your preferred Python IDE.
- Using Command Prompt: Navigate to the folder where your script is saved and run the following command:
python wikipedia_example.py
The script will fetch information from Wikipedia and display the output in the Command Prompt.
Code Explanation
- Search for Topics
The script starts by usingwikipedia.search()
to search for a topic (e.g., “Programming”) and returns a list of related article titles.
search_results = wikipedia.search("Programming")
print(search_results)
- Extracting Summaries
Thewikipedia.summary()
function is used to retrieve a summary for the specified article (e.g., “Linux”) and can also return a limited number of sentences (e.g., for “Android”).
summary = wikipedia.summary("Linux")
short_summary = wikipedia.summary("Android", sentences=2)
print(summary)
- Retrieve Full Wikipedia Page:
Thewikipedia.page()
function fetches all the data about a specific article, including metadata and the full content.
page = wikipedia.page("Android (operating system)")
print(page)
- Retrieve Metadata (Title, References, Categories)
The code then fetches metadata like references, categories, and content using the attributes of thepage
object.
python_page = wikipedia.page("Python (programming language)")
print(python_page.content)
print(python_page.references)
print(python_page.categories)
Handling Errors
The Wikipedia API can occasionally return errors, especially when searching for ambiguous terms or nonexistent pages. You can handle these exceptions as follows:
import wikipedia
try:
page = wikipedia.page("Python (programming language)")
print(page.content)
except wikipedia.exceptions.DisambiguationError as e:
print(f"DisambiguationError: {e.options}")
except wikipedia.exceptions.PageError:
print("Page not found")
Using the Windows Environment for Automation
Once set up, Python scripts can be automated in a Windows environment using Task Scheduler or by running batch files.
Automating with Task Scheduler
- Open Task Scheduler (Windows + R, type
taskschd.msc
, and hit Enter). - Create a new task and set the trigger (e.g., every day at a specific time).
- Under the “Actions” tab, choose “Start a Program” and browse to the location of your Python script.
Running with Batch Files
You can create a batch file (run_wiki_script.bat
) to automate the process of running your Python script:
@echo off
python C:\path\to\your\script\wikipedia_example.py
pause
Save the batch file and double-click it to run your Python script.
Conclusion
This guide demonstrates how to set up Python in a Windows environment to retrieve and process information from Wikipedia. With minimal effort, you can integrate Python’s wikipedia-api
library into your workflow for extracting and processing Wikipedia data. From basic article summaries to full content retrieval, the API offers numerous possibilities for working with one of the most comprehensive information repositories available.
References
- Wikipedia API Documentation. (n.d.). https://wikipedia-api.readthedocs.io/
- PyPi. (n.d.). wikipedia-api. https://pypi.org/project/wikipedia-api/
- Python.org. (n.d.). Download Python. https://www.python.org/downloads/windows/
- Wikipedia. (n.d.). “Wikipedia, The Free Encyclopedia”. Wikipedia. https://www.wikipedia.org/
+ There are no comments
Add yours