How do I split a paragraph into sentences?

I am working with some paragraphs that can contain:

  • Sentences
  • Headings
  • Emails
  • Decimal numbers

I am trying to split the paragraph into sentences. So, for example with this input:

Cary Nelson and Stephen Watt. Martin Horton-Eddison. "First Class Essays" Hull, United Kingdom : Purple Peacock Press, 2012 Carol Tenopir and Donald King. "Towards Electronic Journals: Realities for Librarians and Publishers. SLA, 2000. ISBN 0-87111-507-7. "Scholarly Books" and "Peer Review" in Academic Keywords: A Devil's Dictionary for Higher Education. ISBN 0-415-92203-8. [email protected]

I am trying to get this output:

Cary Nelson and Stephen Watt.

Martin Horton-Eddison. "First Class Essays" Hull, United Kingdom : Purple Peacock Press, 2012 Carol Tenopir and Donald King.

"Towards Electronic Journals: Realities for Librarians and Publishers. SLA, 2000.

ISBN 0-87111-507-7.

"Scholarly Books" and "Peer Review" in Academic Keywords: A Devil's Dictionary for Higher Education.

ISBN 0-415-92203-8.

[email protected]

I have tried using this regular expression, but it is not matching in the way I am expecting.

String[] sentences = Regex.Split(strNew,"(?<=[.!?])\s+(?=[A-Z])");

--------------Solutions-------------

private void parseParagraph(string input)
{
string[] lines = input.Split(new[] { ". " }, StringSplitOptions.None);
foreach(string line in lines)
{
Console.WriteLine(line.Trim());
}
}

Would be a perfect example on how to approach this.

Can´t you just split the string by the character '.'?.

string text = "Cary Nelson and Stephen Watt. Martin Horton-Eddison. "First Class Essays" Hull, United Kingdom : Purple Peacock Press, 2012 Carol Tenopir and Donald King. "Towards Electronic Journals: Realities for Librarians and Publishers. SLA, 2000. ISBN 0-87111-507-7. "Scholarly Books" and "Peer Review" in Academic Keywords: A Devil's Dictionary for Higher Education. ISBN 0-415-92203-8. [email protected]";

string[] myParagraph = text.Split('.');

String [email protected]"Cary Nelson and Stephen Watt. Martin Horton-Eddison. First Class Essays Hull, United Kingdom : Purple Peacock Press, 2012 Carol Tenopir and Donald King. Towards Electronic Journals: Realities for Librarians and Publishers. SLA, 2000. ISBN 0-87111-507-7. Scholarly Books and Peer Review in Academic Keywords: A Devil's Dictionary for Higher Education. ISBN 0-415-92203-8. [email protected]";
String[] sep = {". "};
String[] opt = Paragraph.Split(sep, StringSplitOptions.RemoveEmptyEntries);

string para = @"Cary Nelson and Stephen Watt. Martin Horton-Eddison. "First Class Essays" Hull, United Kingdom : Purple Peacock Press, 2012 Carol Tenopir and Donald King. "Towards Electronic Journals: Realities for Librarians and Publishers. SLA, 2000. ISBN 0-87111-507-7. "Scholarly Books" and "Peer Review" in Academic Keywords: A Devil's Dictionary for Higher Education. ISBN 0-415-92203-8. [email protected]";

string[] sentences = para.Split(new string[] { ". " }, StringSplitOptions.None);
for (int i = 0; i < sentences.Length; i++)
{
Console.WriteLine(sentences[i]);
}

You need to slit by dot(.) and space( ). Because otherwise its cannot separate emails well. Normally after every sentence there should be a space.

Happy Coding.....

Here is an example of what you want

String Paraghaph="Cary Nelson and Stephen Watt. Martin Horton-Eddison. "First Class Essays" Hull, United Kingdom : Purple Peacock Press, 2012 Carol Tenopir and Donald King. "Towards Electronic Journals: Realities for Librarians and Publishers. SLA, 2000. ISBN 0-87111-507-7. "Scholarly Books" and "Peer Review" in Academic Keywords: A Devil's Dictionary for Higher Education. ISBN 0-415-92203-8. [email protected]"//Given Paragraph

String[] opt=Paraghraph.Split('.');//Split Sentences on base of Character .
String mail="";
Bool mailflag=false;
foreach(String Row in opt) //Iterate for each string given in String Array opt
{
if (Row.contains("@") || mailflag==true)
{
if (mailflag==true)
{
Console.Writeline(mail+"."+Row);
mailflag=false;
Mail="";
}
else
{
mail=Row;
mailflag==true;
}
}
Else
{
Console.Writeline(Row+"\n"); //Print each line with two line breaks , if you want one thenYou can use Console.Writeline(Row);
}
}

OR

String [email protected]"Cary Nelson and Stephen Watt. Martin Horton-Eddison. "First Class Essays" Hull, United Kingdom : Purple Peacock Press, 2012 Carol Tenopir and Donald King. "Towards Electronic Journals: Realities for Librarians and Publishers. SLA, 2000. ISBN 0-87111-507-7. "Scholarly Books" and "Peer Review" in Academic Keywords: A Devil's Dictionary for Higher Education. ISBN 0-415-92203-8. [email protected]"//Given Paragraph
String[] Rows = Paraghraph.Split(new string[] { ". " }, StringSplitOptions.None);
foreach (String Row in Rows)
{
Console.Writeline(Row+"\n"); //Print each line with two line breaks , if you want one thenYou can use Console.Writeline(Row);
}

private void SeparateString(string input)
{
string[] stringSplit = input.Split('.');
for (int i = 0; i < stringSplit.Length; i++)
{
if (stringSplit[i].Contains('@'))
{
Console.Write(stringSplit[i] + ".");
}
else
{
Console.WriteLine(stringSplit[i] + ". " + Environment.NewLine);
}

}
}

This displays your desired result perfectly.

Category:c# Time:2018-11-16 Views:5
Tags: regex

Related post

  • How can I split a text into sentences using the Stanford parser? 2012-02-29

    How can I split a text or paragraph into sentences using Stanford parser? Is there any method that can extract sentences, such as getSentencesFromString() as it's provided for Ruby? --------------Solutions------------- You can check the DocumentPrepr

  • How to Split a paragraph into sentences separated by period(.) except when the period is a part of an abbreviation? 2012-01-09

    Consider this text paragraph Conservation groups call the 20-year ban a crucial protection for an American icon. The mining industry and some Republican members of Congress say it is detrimental to Arizona's economy and the nation's energy independen

  • How to Split a Paragraph into Sentences 2010-01-28

    I've been trying to use: $string="The Dr. is here!!! I am glad I'm in the U.S.A. for the Dr. quality is great!!!!!!"; preg_match_all('~.*?[?.!]~s',$string,$sentences); print_r($sentences); But it doesn't work on Dr., U.S.A., etc. Does anyone have any

  • Convert a paragraph into sentences with dynamic memory 2011-11-20

    How can I convert a paragraph into sentences? I have a function signature as follows: char **makeSentences(char *paragraph); In which: paragraph is a string containing several sentences. Paragraph ensures that each sentence ends with a period (.) and

  • Try to figure out a good way to split English document into sentences in C# 2012-01-17

    Is there a good way to split English document into sentences? I mean English document frequently includes Mr. Mrs. U.S.A, etc. It is difficult to separate them out. Do we need a special natural language library to accomplish this? I suspect that we n

  • How can I split my video into parts and save each file in different names? 2014-06-14

    How can I split my video into parts and save each file in different names? --------------Solutions------------- You are going to need a video edition package to do this, you cannot just cut up the video into segments at a file level. With an editor y

  • Splitting a Paragraph into 160 Character Pieces for Text Messaging 2009-11-13

    I'm having trouble with the logic of taking a paragraph of text and splitting it on words/sentences to send out in multiple text messages. Each text message can only have up to 160 characters. I want to cleanly break a paragraph up. Here is the solut

  • How does this regex divide text into sentences? 2010-09-30

    I know this regex divides a text into sentences. Can someone help me understand how? /(?<!\..)([\?\!\.])\s(?!.\.)/ --------------Solutions------------- Portions: ([\?\!\.])\s: split by ending character (.,!,or ?) which is followed by a whitespace

  • How do i split a String into multiple values? 2008-10-28

    How do you split a string? Lets say i have a string "dog, cat, mouse,bird" My actual goal is to insert each of those animals into a listBox, so they would become items in a list box. but i think i get the idea on how to insert those items if i know h

  • How can I split a string into chunks of two characters each in Perl? 2008-12-16

    How do I take a string in Perl and split it up into an array with entries two characters long each? I attempted this: @array = split(/../, $string); but did not get the expected results. Ultimately I want to turn something like this F53CBBA476 in to

  • How do I split a string into an array? 2009-07-30

    I want to split a string into an array. The string is as follows: :hello:mr.zoghal: I would like to split it as follows: hello mr.zoghal I tried ... string[] split = string.Split(new Char[] {':'}); and now I want to have: string something = hello ; s

  • How do I split a URL into 2 parts in Ruby? 2009-11-23

    I have a ruby script that downloads URLs from an RSS server and then downloads the files at those URLs. I need to split the URL into 2 components like so - http://www.website.com/dir1/dir2/file.txt --> 'www.website.com' and 'dir1/dir2/file.txt' I'

  • How do I split a file into n no of parts 2010-07-07

    I have a file contining some no of lines. I want split file into n no.of files with particular names. It doesn't matter how many line present in each file. I just want particular no.of files (say 5). here the problem is the no of lines in the origina

  • How do I split a vector into two columns to create ordered pairs for random assignment 2010-07-09

    I am trying to generate random pairs from 34 subjects for an experiment. Subjects will be assigned ID #'s 1-34. To generate the random ordered numbers (1-34) I used the following code: ### Getting a vector of random ordered numbers 1-34### pairs<-

  • How can I split a string into two separate arrays using the .NET framework? 2010-07-29

    I've got a string containing both ints and a string. How do I split it into two arrays, one for ints and one for the string? I also need to maintain the order because I'm writing a file parsing system that depends on reading in and correctly splittin

  • How does strtok() split the string into tokens in C? 2010-10-08

    Please explain me the working of strtok() function.The manual says it breaks the string into tokens. I am unable to understand from the manual what actually it does. I added watches on str and *pch to check its working, when the first while loop occu

  • How do I split a string into an array of characters? 2011-06-26

    var s = "overpopulation"; var ar = []; ar = s.split(); alert(ar); I want to string.split a word into array of characters. The above code doesn't seem to work - it returns "overpopulation" as Object.. How do i split it into array of characters, if ori

  • How do I split a string into three parts? 2011-07-03

    I have the string "001-1776591-7", and I want to divide it into 3 parts, "-" being the split parameter. I have already created two methods, for the first and last, but what about the second part of the string, how can I get that? More Info: I created

  • How do i split this array into two? 2011-09-28

    I have this array. Array ( [name] => Array ( [isRequired] => 1 [isBetween] => 1 [isAlphaLower] => [isLength] => ) [email] => Array ( [isEmail] => 1 ) [pPhone] => Array ( [isPhone] => ) ) i want to split the array into two.

  • Java - How do I split an array into four separate arrays? 2012-03-12

    I have an array which contains a string of numbers such as: 1011 and I wanted to split it up into four separate arrays containing each of those values. How do I do this? String [] array = {1,0,1,1}; //would I do something like this: array.substring(0

Copyright (C) pcaskme.com, All Rights Reserved.

processed in 0.705 (s). 13 q(s)