HackerRank Detect the Domain Name Solution

Hello Programmers, In this post, you will know how to solve the HackerRank Detect the Domain Name Solution. This problem is a part of the Regex HackerRank Series.

HackerRank Detect the Domain Name Solution
HackerRank Detect the Domain Name Solution

One more thing to add, don’t directly look for the solutions, first try to solve the problems of Hackerrank by yourself. If you find any difficulty after trying several times, then you can look for solutions.

HackerRank Detect the Domain Name Solution

Problem

You will be provided with a chunk of HTML markup. Your task is to identify the unique domain names from the links or Urls which are present in the markup fragment.

For example, if the link http://www.hackerrank.com/contest is present in the markup, you should detect the domain: hackerrank.com. In case there are second level or higher level domains present in the markup, all of them need to be treated as unique. For instance if the links http://www.xyz.com/newshttps://abc.xyz.com/jobshttp://abcd.xyz.com/jobs2 are present in the markup then [xyz.com, abc.xyz.com, abcd.xyz.com] should all be identified as unique domains present in the markup. Prefixes like “www.” and “ww2.”, if present, should be scrubbed out from the domain name.

Input Format

An Integer N. This is equal to the number of lines in the HTML Fragment which follows. A chunk of HTML Markup with embedded links, the length of which is N lines.

Output Format

One line, containing the list of detected domains, separated by semicolons, in lexicographical order. Do not leave any leading or trailing spaces either at the ends of the line, or before and after the individual domain names.

Sample Input

 10
<div class="reflist" style="list-style-type: decimal;">
<ol class="references">
<li id="cite_note-1"><span class="mw-cite-backlink"><b>^ ["Train (noun)"](http://www.askoxford.com/concise_oed/train?view=uk). <i>(definition – Compact OED)</i>. Oxford University Press<span class="reference-accessdate">. Retrieved 2008-03-18</span>.</span><span title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fen.wikipedia.org%3ATrain&rft.atitle=Train+%28noun%29&rft.genre=article&rft_id=http%3A%2F%2Fwww.askoxford.com%2Fconcise_oed%2Ftrain%3Fview%3Duk&rft.jtitle=%28definition+%E2%80%93+Compact+OED%29&rft.pub=Oxford+University+Press&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal" class="Z3988"><span style="display:none;"> </span></span></span></li>
<li id="cite_note-2"><span class="mw-cite-backlink"><b>^</b></span> <span class="reference-text"><span class="citation book">Atchison, Topeka and Santa Fe Railway (1948). <i>Rules: Operating Department</i>. p. 7.</span><span title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fen.wikipedia.org%3ATrain&rft.au=Atchison%2C+Topeka+and+Santa+Fe+Railway&rft.aulast=Atchison%2C+Topeka+and+Santa+Fe+Railway&rft.btitle=Rules%3A+Operating+Department&rft.date=1948&rft.genre=book&rft.pages=7&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook" class="Z3988"><span style="display:none;"> </span></span></span></li>
<li id="cite_note-3"><span class="mw-cite-backlink"><b>^ [Hydrogen trains](http://www.hydrogencarsnow.com/blog2/index.php/hydrogen-vehicles/i-hear-the-hydrogen-train-a-comin-its-rolling-round-the-bend/)</span></li>
<li id="cite_note-4"><span class="mw-cite-backlink"><b>^ [Vehicle Projects Inc. Fuel cell locomotive](http://www.bnsf.com/media/news/articles/2008/01/2008-01-09a.html)</span></li>
<li id="cite_note-5"><span class="mw-cite-backlink"><b>^</b></span> <span class="reference-text"><span class="citation book">Central Japan Railway (2006). <i>Central Japan Railway Data Book 2006</i>. p. 16.</span><span title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fen.wikipedia.org%3ATrain&rft.au=Central+Japan+Railway&rft.aulast=Central+Japan+Railway&rft.btitle=Central+Japan+Railway+Data+Book+2006&rft.date=2006&rft.genre=book&rft.pages=16&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook" class="Z3988"><span style="display:none;"> </span></span></span></li>
<li id="cite_note-6"><span class="mw-cite-backlink"><b>^ ["Overview Of the existing Mumbai Suburban Railway"](http://web.archive.org/web/20080620033027/http://www.mrvc.indianrail.gov.in/overview.htm). _Official webpage of Mumbai Railway Vikas Corporation_. Archived from [the original](http://www.mrvc.indianrail.gov.in/overview.htm) on 2008-06-20<span class="reference-accessdate">. Retrieved 2008-12-11</span>.</span><span title="ctx_ver=Z39.88-2004&rfr_id=info%3Asid%2Fen.wikipedia.org%3ATrain&rft.atitle=Overview+Of+the+existing+Mumbai+Suburban+Railway&rft.genre=article&rft_id=http%3A%2F%2Fwww.mrvc.indianrail.gov.in%2Foverview.htm&rft.jtitle=Official+webpage+of+Mumbai+Railway+Vikas+Corporation&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal" class="Z3988"><span style="display:none;"> </span></span></span></li>
</ol>
</div>

Sample Output

askoxford.com;bnsf.com;hydrogencarsnow.com;mrvc.indianrail.gov.in;web.archive.org

Ezoicreport this adHackerRank Detect the Domain Name Solutions in Cpp

#include <iostream>
#include <string>
#include <regex>
#include <set>
#include <algorithm>
#include <iterator>
// Returns the text from stdin without newline characters
std::string get_text() {
  std::string text;
  std::string line;
  while (std::getline(std::cin, line)) text += line;
  return text;
}
template <class ForwardIt>
void print_delimited(ForwardIt first, ForwardIt last, char del) {
  if (first == last) return;
  std::cout << *first++;
  for (; first != last; ++first) std::cout << del << *first;
}
const auto regex_expr =
    "https?://(?:www.|ww2.)?"               // Begin
    "((?:[-[:alnum:]]+\\.)+[[:alpha:]]+)";  // Domain name
int main() {
  std::ios_base::sync_with_stdio(false);
  std::cin.tie(nullptr);
  const auto text = get_text();
  std::regex regex(regex_expr);
  std::set<std::string> tags;
  const auto begin = std::sregex_iterator(text.begin(), text.end(), regex);
  const auto end = std::sregex_iterator();
  std::set<std::string> domain_names;
  std::for_each(begin, end,
                [&](const std::smatch& m) { domain_names.insert(m[1]); });
  print_delimited(domain_names.begin(), domain_names.end(), ';');
  std::cout << '\n';
}

HackerRank Detect the Domain Name Solutions in Java

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution {
    public static void main(String[] args) {
        /* Enter your code here. Read input from STDIN. Print output to STDOUT. Your class should be named Solution. */
		Scanner in = new Scanner(System.in);
		String format = "(http|https)\\://(www.|ww2.|)([a-zA-Z0-9\\-\\.]+)(\\.[a-zA-Z]+)(/\\S*)?";
		Pattern pattern = Pattern.compile(format);
		ArrayList<String>links = new ArrayList<String>();
		int testcase = in.nextInt();
		String dec = in.nextLine();
		for(int i = 0;i<testcase;i++){
			String assessed = in.nextLine();
			Matcher match = pattern.matcher(assessed);
			while(match.find()){
					match.groupCount();
					if(links.contains(match.group(3)+match.group(4)) == false){
						links.add(match.group(3)+match.group(4));
					}
			}
		}
		Collections.sort(links);
		for(int j = 0;j<links.size();j++){
			if(j == links.size()-1){
				System.out.println(links.get(j));
			}
			else{
				System.out.print(links.get(j)+";");
			}
		}
    }
}
Ezoicreport this ad

HackerRank Detect the Domain Name Solutions in Python

# Enter your code here. Read input from STDIN. Print output to STDOUT
import re
N = int(raw_input().strip())
tags = set()
for i in range(N):
    str = raw_input().strip()
    t = re.findall(r"[=\'\"](?:https{0,1}\:\/\/(?:ww[w0-9]\.){0,1})([0-9a-zA-Z][0-9a-zA-Z_\-\.]+\.[a-zA-Z]+)",str)
    #print t
    for tag in t:
        if tag not in tags:
            tags.add(tag)
taglist = sorted(list(tags))
print ';'.join(taglist)

HackerRank Detect the Domain Name Solutions in JavaScript

function processData(input) {
    var lines = input.split('\n');
    var N = parseInt(lines.shift(), 10);
    var text = lines.join(' ');
    var domainREStr = 'https?://(?:ww[a-zA-Z0-9_-]+\\.)?([a-zA-Z0-9_-]+(?:\\.[a-zA-Z0-9_-]+)+)[/?"\']';
    var re = new RegExp(domainREStr, 'ig');
    var domains = {};
    var arr = null;
    while ((arr = re.exec(text)) != null) {
        domains[arr[1].trim()] = 0;
    }
    var res = [];
    for (var i in domains) {
        res.push(i);
    }
    res.sort();
    process.stdout.write(res.join(';') + '\n');
}
process.stdin.resume();
process.stdin.setEncoding("ascii");
_input = "";
process.stdin.on("data", function (input) {
    _input += input;
});
process.stdin.on("end", function () {
    processData(_input);
});

HackerRank Detect the Domain Name Solutions in PHP

<?php
$lin = fgets(STDIN);
$content="";
for($i=0;$i<$lin;$i++){
    $content .= fgets(STDIN);
}
//echo substr($content,strpos($content,"imshopping.rediff.com")-250,500);
$urls= array();
if (preg_match_all('/https{0,1}\:\/\/([\.a-z0-9\-]+)/im',$content,$matches)){
    foreach($matches[1] as $urlunfiltered){
        $urlfiltered=preg_replace("/^www\./i","",$urlunfiltered);
        $urlfiltered=preg_replace("/^ww[0-9]+\./i","",$urlfiltered);
        if (preg_match("/[a-z0-9]+\.[a-z0-9]+/i",$urlfiltered)){
            $urls[($urlfiltered)]=1;
        }
    }
}
//print_r($urls);
$urls=array_keys($urls);
sort($urls);
echo implode(";",$urls);
?>

Disclaimer: This problem (Detect the Domain Name) is generated by HackerRank but the Solution is Provided by BrokenProgrammers. This tutorial is only for Educational and Learning purposes.

Next: HackerRank Building a Smart IDE: Identifying comments Solution

Sharing Is Caring

Leave a Comment

Ezoicreport this ad