LeetCode 1236: Web Crawler Solution
Master LeetCode problem 1236 (Web Crawler), a medium challenge, with our optimized solutions in Java, C++, and Python. Explore detailed explanations, test your code in our interactive editor, and prepare for coding interviews.
1236. Web Crawler
Problem Explanation
Explanation:
To implement a web crawler, we can use a breadth-first search (BFS) algorithm to traverse the web pages starting from the given URL. We keep track of visited URLs to avoid revisiting the same URL and also maintain a queue of URLs to visit next. We continue this process until the queue is empty.
Algorithm:
- Initialize a set to store visited URLs and a queue to store URLs to visit.
- Add the start URL to the queue and mark it as visited.
- While the queue is not empty, do the following:
- Dequeue a URL from the queue.
- Retrieve the HTML content of the URL.
- Extract all the links from the HTML content.
- For each extracted link, if it has not been visited before, add it to the queue and mark it as visited.
- Continue this process until the queue is empty.
Time Complexity: The time complexity of the BFS traversal in the worst-case scenario is O(V + E), where V is the number of vertices (URLs) and E is the number of edges (links between URLs) in the web graph.
Space Complexity: The space complexity is O(V) to store the visited URLs and the queue.
: :
Solution Code
class Solution {
public List<String> crawl(String startUrl, HtmlParser htmlParser) {
Set<String> visited = new HashSet<>();
Queue<String> queue = new LinkedList<>();
String hostname = getHostname(startUrl);
List<String> result = new ArrayList<>();
queue.offer(startUrl);
visited.add(startUrl);
while (!queue.isEmpty()) {
String currUrl = queue.poll();
result.add(currUrl);
List<String> links = htmlParser.getUrls(currUrl);
for (String link : links) {
if (isSameHostname(link, hostname) && !visited.contains(link)) {
queue.offer(link);
visited.add(link);
}
}
}
return result;
}
private String getHostname(String url) {
// Implement logic to extract hostname from URL
}
private boolean isSameHostname(String url, String hostname) {
// Implement logic to check if URL has the same hostname
}
}Try It Yourself
Loading code editor...
Related LeetCode Problems
Frequently Asked Questions
How to solve LeetCode 1236 (Web Crawler)?
This page provides optimized solutions for LeetCode problem 1236 (Web Crawler) in Java, C++, and Python, along with a detailed explanation and an interactive code editor to test your code.
What is the time complexity of LeetCode 1236 (Web Crawler)?
The time complexity for LeetCode 1236 (Web Crawler) varies by solution. Check the detailed explanation section for specific complexities in Java, C++, and Python implementations.
Can I run code for LeetCode 1236 on DevExCode?
Yes, DevExCode provides an interactive code editor where you can write, test, and run your code for LeetCode 1236 in Java, C++, or Python.