Header Ads

Azure Machine Learning Studio: Multiple Language Named Entity Recognition (NER) Text Analysis

Textual analysis is one of the branch of machine learning domain that extracts interesting insights from a textual data, for example, sentiment/emotional analysis of a human behavior based on the tone in which he/she writes the text, categorizing people, organizations and locations as a separate entity formally known as Named Entity Recognition (NER) model and many more.There are many tools, technologies and languages out there in which machine learning models are written and processed such as python, R-scripts, Azure Machine Learning Studio, IBM machine learning tools and many more. Python is the most popular scripting language used for writing and processing machine learning models.


Today, I shall demonstrate Azure Machine Learning Studio Named Entity Recognition (NER) module to extract people, location and organization entities from my provided textual dataset in Urdu language. Know that Azure Machine Learning Studio Named Entity Recognition (NER) module currently supports only English language text and can only recognize people, location and organization from the text. However, I will demonstrate a very simple technique to process Azure Machine Learning Studio Named Entity Recognition (NER) module with any language. I am choosing here Urdu language as a base case. You can however, choose any other language of your choice.

Prerequisite

Following are some prerequisites before you proceed further in this tutorial:
  1. Knowledge of Azure Machine Learning Studio.
  2. Registration on Azure Machine Learning Studio Free Account
  3. Basic understanding of machine learning Named Entity Recognition (NER) concept.
  4. Knowledge of SQLite Query Writing.
You can download the SQLlite query complete source code and sample pre-processed dataset for this tutorial. I have downloaded the sample dataset from MWaseemRandhawa GitHub Account

Download Now!

Let's begin now. 

1)  Microsoft Azure Machine Learning Studio, Named Entity Recognition (NER) module currently supports English language only. Therefore, in order to perform NER analysis on the non-English language, the first step is to translate the textual data into English language using any suitable translation API e.g. Google Translation API, Bing translation API or any other suitable translation API. So as a first step, I have converted my target Urdu language text dataset into English language text dataset using Google Translation API.

2) Next step is to import my pre-processed dataset into Azure Machine Learning Studio i.e. login to your Azure Machine Learning Studio and then import the pre-processed dataset as shown below i.e.





3) Now, create a new empty experiment and name it "Multiple Language Named Entity Recognition (NER)" as shown below i.e.



4) In the right pane, search for your imported dataset and then drag n drop your dataset on the experiment window and then right click->Dataset->Visualize on the module to view your dataset as shown below i.e.



In the above image you can see Urdu as well as English translated text of my dataset.

5) Now, search for "Select Columns Dataset" module and select only "summery_eng" column, since NER module is applied on a single column only as shown below i.e.




6) Now, search for "Named Entity Recognition" module and connect your selected English language text column which is selected previously as an input. Notice that Named Entity Recognition module do not provide any configurations as shown below i.e.


7) Run the experiment and then visualize the results that Named Entity Recognition module has compiled as shown below i.e.




In the above image you can see that Named Entity Recognition module extracts person, location and organization entities for my selected text column. If there are multiple entities in the text then each entity is expanded to a new row. Notice that Article ID is attached with each entity, the article ID is auto generated in the same order as the order of the provided dataset rows. Article ID starts with "0", since, in my sample dataset 0th row does not contain any entity therefore that row is not included in the result of NER module. Similarly, my provided dataset row number 3 contains multiple entities therefore each entity is expanded to new row but, attached Article ID is same. "Offset" is the starting position at which the recognized entity is found, "Length" is the size of recognized entity including spaces if any and finally, "Type" is the categorization of the recognized entity in person, location or organization.

8) For Next step, I want my resultant NER dataset to be combined with my existing dataset as a sparse matrix with one new column represents person entity, second new column represents location and third new column represents organization. For this matter, search for "Apply SQL Transformation" module in which I have written a SQLite query to group each entity into a single row by using article Id and split type column into three columns for each row. You can download the query provided above in this article. Below you can see connection of "Apply SQL Transformation" module in action i.e.




In the above image you can visualize that I have separated each row with "|" symbol and combine each column by comma "," symbol.

9) Let's combine our input dataset with the resultant dataset and form a sparse matrix. Know that I have already attached Article ID with my input dataset at pre-processing data transformation step. Search for "Join Data" module and use "Left Join" and combine the two datasets as shown below i.e.




10) Finally, download your resultant dataset as a CSV file. Search for "Convert to CSV" module and download your dataset as shown below i.e.


Conclusion

In this article, you will learn the technique to extract people, location and organization entities from multiple language textual dataset using Azure Machine Learning Studio Named Entity Recognition (NER) module. You will also learn to connect "Apply SQL Transformation" module, you will learn to use "Join Data" module to combine two datasets with Left Join and finally you will learn to use "Convert to CSV" module to download your resultant dataset into CSV file format.

15 comments:

  1. Your blog provided us with valuable information to work with. Each & every tips of your post are awesome. Thanks a lot for sharing. Keep blogging,
    ExcelR AI Training in Bangalore

    ReplyDelete
  2. Attend The Artificial Intelligence course in bangalore From ExcelR. Practical Artificial Intelligence course in bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Artificial Intelligence course in bangalore.
    ExcelR AI Training in Bangalore

    ReplyDelete
  3. Maybe they neglect to understand that the two (read: DA and mining) can be joined to determine the most extreme advantages.Data Analytics Course

    ReplyDelete
  4. Wow! Such an amazing and helpful post this is. I really really love it. It's so good and so awesome. I am just amazed. I hope that you continue to do your work like this in the future also.
    artificial intelligence ai and deep learning in Guwahati

    ReplyDelete
  5. Great blog found to be well written that everyone will understand and gain the enough knowledge from your blog being more informative is an added advantage for the users who are going through it. Once again nice blog keep it up.

    360DigiTMG Cloud Computing Course

    ReplyDelete
  6. You finished certain solid focuses there. I did a pursuit regarding the matter and discovered almost all people will concur with your blog.
    data science course in noida

    ReplyDelete
  7. Hi to everybody, here everyone is sharing such knowledge, so it’s fastidious to see this site, and I used to visit this blog daily
    Best Data Science Courses in Hyderabad

    ReplyDelete
  8. It is a great pleasure to read your message. It's full of information I'm looking for and love to post a comment that says "The content of your post is amazing". Excellent work.
    CCNA course in Dhule
    CCNA classes in DhuleCCNA training in Dhule

    ReplyDelete
  9. Wow, happy to see this awesome post. I hope this think help any newbie for their awesome work and by the way thanks for share this awesomeness, i thought this was a pretty interesting read when it comes to this topic. Thank you..

    DevOps Training in Hyderabad

    ReplyDelete
  10. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
    data scientist training in malaysia

    ReplyDelete
  11. Awesome blog gigantic adulation to the blogger and trusting you to concoct a particularly remarkable substance in future. Without a doubt, this post will motivate numerous competitors who are extremely sharp in acquiring the information. Expecting a lot more substance with parcel greater interest further…

    Data Science Training in Hyderabad

    ReplyDelete
  12. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    cyber security certification malaysia

    ReplyDelete
  13. Thanks for the amazing information. please continue to share| Data Science course in Kolkata

    ReplyDelete