HackerRank Detect HTML links Solution

Hello Programmers, In this post, you will know how to solve the HackerRank Detect HTML links Solution. This problem is a part of the Regex HackerRank Series.

HackerRank Detect HTML links Solution
HackerRank Detect HTML links Solutions

One more thing to add, don’t directly look for the solutions, first try to solve the problems of Hackerrank by yourself. If you find any difficulty after trying several times, then you can look for solutions.

Task

Charlie has been given an assignment by his Professor to strip the links and the text name from the html pages.
A html link is of the form,

Where a is the tag and href is an attribute which holds the link charlie is interested in. The text name is HackerRank.

Charlie notices that the text name can sometimes be hidden within multiple tags

Here, the text name is hidden inside the tags h1 and b.

Help Charlie in listing all the links and the text name of the links.

Input Format

The first line contains the number of lines in the fragment (N). This is followed by N lines from a valid HTML document or fragment.

Constraints

N < 100
Number of characters in the test fragments <= 10000 characters.
Characters will be restricted to ASCII. Fragments for the tests will be picked up from Wikipedia. Also, some tests might not have text or names on the links.

Output Format

If there are M links in the document, display each of them in a new line. The link and the text name must be separated by a “,” (comma) with no spaces between them.
Strip out any extra spaces at the start and end position of both the link and the text name before printing.

link-1,text name-1 link-2,text name-2 link-3,text name-3 …. link-n,text name-M

Sample Input

Sample Input:1

Sample Input:2

Sample Output

Sample Output:1

Sample Output:2

Explanation

Viewing Submissions

You can view others’ submissions if you solve this challenge. Navigate to the challenge leaderboard.

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution {
    public static void main(String[] args) {
        Scanner scin = new Scanner(System.in);
        int N = scin.nextInt();
        scin.nextLine();	//advance to next line...
        
        Pattern patt = Pattern.compile("<\\s*a.*?href=\"(.*?)\".*?>(.*?)</a>");
        
        for (int i = 0; i < N; i++) {
        	StringBuilder sb = new StringBuilder();
        	String line = scin.nextLine();
        	
        	Matcher m = patt.matcher(line);
        	
        	boolean found = false;
        	while (m.find()) {
        		sb.append(m.group(1).trim());	//link
        		sb.append(",");
        		sb.append(m.group(2).replaceAll("<.*?>", "").trim());	//text name
        		sb.append("\n");
        		found = true;
        	}
        	
        	if (found)
        		System.out.print(sb.toString());
        }
    }
}
import re
def main():
    pattern = re.compile(r'<\s*a\s+href\s*=\s*["]([^"]*)["][^>]*>(?:<\s*\w\s*>)*\s*([^<]*)')
    n = int(raw_input())
    for _ in range(n):
        for m in pattern.finditer(raw_input()):
            print "%s,%s" % (m.group(1), m.group(2))
        
if __name__ == "__main__":
    main()
process.stdin.resume();
process.stdin.setEncoding("ascii");
process.stdin.on("data", function (input) {
	input = input.split('\n').slice(1).join('');
	var r = /<\s*a\s*href=['"]([^'"]+)['"][^>]*>\s*(.*?)\s*(?=<\s*\/\s*a>)<\s*\/\s*a>/ig;
	var url = new Array(), title = new Array();
	input = input.match(r);
	for (i=0, j=input.length; i<j; i+=1) {
		url[i] = input[i].replace(r, '$1');
		title[i] = input[i].replace(r, '$2');
		var tmp = title[i].match(/(?:<[^>]+>)*((?!<))/ig);
		for (ii=0, jj=tmp.length; ii<jj; ii+=1) {
			if (tmp[ii] !== '') {
				title[i] = title[i].replace(tmp[ii], '');
			}
		}
		console.log(url[i]+','+title[i]);
	}
});
<?php
$_fp = fopen("php://stdin", "r");
/* Enter your code here. Read input from STDIN. Print output to STDOUT */
fscanf($_fp, "%d", $m);
$lines = array();
for ($i = 0; $i < $m; $i++) {
    $lines[] = trim(fgets($_fp));
}
$search = '/<\s*a\s[^>]*href="([^"]*)"[^>]*>((?:(?!<\/a>).)*)<\/a>/i';
$matches = array();
if (preg_match_all($search, implode($lines), $matches)) {
    foreach ($matches[1] as $i => $link) {
        print $link . ',' . trim(strip_tags($matches[2][$i])) . PHP_EOL;
    }
}

Disclaimer: This problem (Detect HTML links) is generated by HackerRank but the Solution is Provided by BrokenProgrammers. This tutorial is only for Educational and Learning purposes.

Next: HackerRank Matching Zero Or More Repetitions Solution

Leave a Reply

Your email address will not be published. Required fields are marked *