SRP Extract Xml

sbotes · March 2015

I have been trying to extract data in a response from a Soap server. Name spaces are being used. I have tried every combination I can think of. Any help with the syntax would be appreciated. Following is what I am working with. I have verified the response to be as shown.

This is the response.


<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <soap:Body>
     <HelloWorldResponse xmlns="http://tempuri.org/">
        <HelloWorldResult>not authorized</HelloWorldResult>
     </HelloWorldResponse>
  </soap:Body>
</soap:Envelope>

Here are some of the strings I have tried.


Xpath      = "/soap:Envelope/soap:Body/soap:HelloWorldResponse/MR:HelloWorldResult/text()"	
Xpath      = "/Envelope/Body/HelloWorldResponse/HelloWorldResult/text()"    
Xpath      = "/soap:Envelope/soap:Body/soap:HelloWorldResponse/HelloWorldResult/text()"
NameSpace  = 'xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" '
NameSpace := 'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" '
NameSpace := 'xmlns:xsd="http://www.w3.org/2001/XMLSchema" '
NameSpace := 'xmlns="http://tempuri.org/"'
Response   = SRP_Extract_Xml(MRResponse, Xpath, NameSpace)

I do not get any errors just null returned.

I think what appears to be unused NameSpace, xsi and xsd may be confusing me. Plus there are references to the default namespace in both the SRP documentation and the w3schools NameSpace help. I am not sure that I am using them correctly.

KevinFournier · March 2015

Try this:


Xpath = "/soap:Envelope/soap:Body/t:HelloWorldResponse/t:HelloWorldResult/text()"

NameSpace  = 'xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" '
NameSpace := 'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" '
NameSpace := 'xmlns:xsd="http://www.w3.org/2001/XMLSchema" '
NameSpace := 'xmlns:t="http://tempuri.org/"'

Response   = SRP_Extract_Xml(MRResponse, Xpath, NameSpace)

Default namespaces confuse XPath when they are set deep inside the XML. So, I simply named the default namespace (the tempuri one) to "t", then I used "t:" in the Xpath query. That seemed to work for me.

sbotes · March 2015

That worked. Thank you.

In taking your example from the documentation but with mr rather than t as you are showing. I could not get it to work. I believe I must have left the namespace identifier off of the last tag. I went back and tested and it worked with the mr.

Wondering if there is an issue with default values. I found documentation that indicated that a namespace definition such as this one in the response would be the default and thus you could leave off the tags.

Thanks for the clarification.

Off to being productive!

KevinFournier · March 2015

I think it's an issue of the default namespace changing inside the XML. In other words, xmlns is "" at the beginning of the xml and becomes "http://tempuri.org/" later in the XML. This makes it hard for XPath to determine what the tags are. That's why I changed the XPath to use the "t" namespace in order just ignore the default namespace issues altogether.

sbotes · March 2017

Well I thought I had this stuff figured out..... but I am frustrated.

Kevin will you please assist with this xml string. It is the return from a failed http post. This looks like it has a name space. I have tried numerous ways. to get at what appears to be simple. All I want is to pick up the div id attributes header and content.

I presumed that this piece, (It gets modified when I preview this and that probably means that it will post modified. So image is attached)

<html xmlns="http://www.w3.org/1999/xhtml">
meant I needed to provide a name space. I tried every which way using your examples from two years ago. Below is one that did not work.

I tried with and without a NameSpace

Would you please show me how to access the div entries in the body section.

Ways I tried


NameSpace  = 'xmlns:t="<a href="http://www.w3.org/1999/xhtml""</a>'
NameSpace  = 'xmlns="http://www.w3.org/1999/xhtml"'

Different xpath statements


Xpath = "/t:html/t:body/t:div/text()"
Xpath = "/xmlns:html/xmlns:body"
Xpath = '/html/body/div'
Xpath = '/html/body/div'
aaxhtml 	= SRP_Extract_Xml(ErrorResponseBody, xPath, NameSpace )

Actual data ******************


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<title>404 - File or directory not found.</title>
<style type="text/css">
<!--
body{margin:0;font-size:.7em;font-family:Verdana, Arial, Helvetica, sans-serif;background:#EEEEEE;}
fieldset{padding:0 15px 10px 15px;} 
h1{font-size:2.4em;margin:0;color:#FFF;}
h2{font-size:1.7em;margin:0;color:#CC0000;} 
h3{font-size:1.2em;margin:10px 0 0 0;color:#000000;} 
#header{width:96%;margin:0 0 0 0;padding:6px 2% 6px 2%;font-family:"trebuchet MS", Verdana, sans-serif;color:#FFF;
background-color:#555555;}
#content{margin:0 0 0 2%;position:relative;}
.content-container{background:#FFF;width:96%;margin-top:8px;padding:10px;position:relative;}
-->
</style>
</head>
<body>
<div id="header"><h1>Server Error</h1></div>
<div id="content">
 <div class="content-container"><fieldset>
  <h2>404 - File or directory not found.</h2>
  <h3>The resource you are looking for might have been removed, had its name changed, or is temporarily unavailable.</h3>
 </fieldset></div>
</div>
</body>
</html>

KevinFournier · March 2017

This is not XML, so I have doubts SRP_Extract_XML is ever going to work with this.

sbotes · March 2017

Well that would explain why I can't get it to work. )-: This is data that is returned when I make a request to get the additional error detail. Httpclient_Services("GetResponseBody")

I have purposely used a bad request in a REST API post. I am most interested in the section starting with the tag .

If it is not XML then can you tell me what it is and how I might extract what I need. Knowing what I am working with would help.

DonBakke · March 2017

This is just plain HTML, which is expected when the server cannot find the URL you requested.

sbotes · March 2017

Kevin, Now I understand that this is a web page that is being returned from my forced error. I also now understand that this will be different depending on the target URL and the service behind it. Thus other than perhaps opening a web page and displaying this there is not much I can do. It will always be a moving target as far as format goes. And of course in an automated production environment popping a web page up and looking for human interaction would not be a good choice. (-:

Thanks for taking the time to respond.

DonBakke · March 2017

Ahem...that last response was from me. :)

Yes, it will be a moving target but you can get some measure of control if you check a few things up front. For instance, the GetResponseHeaderField service using the argument "Content-Type" should tell you if the response is application/xml or text/html. You should also use the GetResponseStatusCode service to check for common codes (like 404).

JimLeong · April 2018

Hi. I been using your xml utility to extract my xml's. My trading partner decided to change the format. Now I am unable to extract. They added the namespaces on the ZDNOTICE tag. Any suggestions?

DonBakke · April 2018

I assume you were posting XML but it appears that all the tags got stripped. Can you post it again but try using the Code formatting option?

JimLeong · April 2018

Hi Don. Thanks for the quick response.

Excuse my ignorance. How do I use the Code formatting option?

Normally I would use result = SRP_Extract_Xml(xmlRec, "/ZDNOTICE")

DonBakke · April 2018

I meant use the forum Code formatting feature. When you post your XML, select it and then click the dropdown in the toolbar with the Paragraph symbol. Pick "Code". Your XML will look like this:

<xml>
<value>foo</value>
</xml>

JimLeong · April 2018

JimLeong · April 2018

JimLeong · April 2018

Hope this works

<?xml version="1.0"?>
<ZDNOTICE xmlns="http://www.cbsa-asfc.gc.ca/ARL/ZARL/2017-11" file_split="E01" file_type="B" file_name="DN-101178382-20170425140341" version="201711">
<HEADER>
<MESSAGE_EN>English Message</MESSAGE_EN>
<MESSAGE_FR>French Message</MESSAGE_FR>
<PARTY>
<BN9>101178382</BN9>
<ACCOUNT/>
<NAME_ORG1>Broker 1</NAME_ORG1>
</PARTY>
<DN_DATE>2017-04-25</DN_DATE>
<TRANS_DATE>2015-10-01</TRANS_DATE>
<IMP_SECT>1</IMP_SECT>
<PAYMENTS>250.0</PAYMENTS>
<DISB>0.0</DISB>
</HEADER>
<DETAILS count="1">
<IMPORTER id="1">
<PARTY>
<BN9>654321001</BN9>
<ACCOUNT>RM0001</ACCOUNT>
<NAME_ORG1>IMPORTER 1</NAME_ORG1>
</PARTY>
<PAYMENTS>250.0</PAYMENTS>
<DISB>0.0</DISB>
<B3_DEBT>
<LINEITEMS count="2">
<LINEITEM id="1">
<DOC_TYPE>B3</DOC_TYPE>
<REL_DATE>2015-09-29</REL_DATE>
<ACC_DATE>2015-10-01</ACC_DATE>
<FIELD6>G</FIELD6>
<PORT>0481</PORT>
<DOC_NUMBER>14362100000004</DOC_NUMBER>
<AMOUNTS>
<DUTIES>1000.0</DUTIES>
<GST>1254.23</GST>
<TOTAL>2254.23</TOTAL>
</AMOUNTS>
</LINEITEM>
<LINEITEM id="2">
<DOC_TYPE>B3</DOC_TYPE>
<REL_DATE>2015-09-29</REL_DATE>
<ACC_DATE>2015-10-01</ACC_DATE>
<FIELD6>G</FIELD6>
<PORT>0481</PORT>
<DOC_NUMBER>14362100000004</DOC_NUMBER>
<AMOUNTS>
<DUTIES>2000.0</DUTIES>
<GST>200.23</GST>
<TOTAL>2200.23</TOTAL>
</AMOUNTS>
</LINEITEM>
</LINEITEMS>
<LINE_TOTALS>
<AMOUNTS>
<DUTIES>3000.0</DUTIES>
<GST>1454.46</GST>
<TOTAL>4454.46</TOTAL>
</AMOUNTS>
</LINE_TOTALS>
</B3_DEBT>
<OTHERS otheritem_count="6">
<OTHERITEM id="1">
<DOC_DATE>2017-03-07</DOC_DATE>
<DOC_TYPE>B2-1, AP/CP</DOC_TYPE>
<DOC_NUMBER>10123780099009</DOC_NUMBER>
<REL_DOC_NUMBER>10123763337658</REL_DOC_NUMBER>
<PAY_DUE>2017-03-31</PAY_DUE>
<TOT_DUE>-500.0</TOT_DUE>
<REVIEW/>
<BROKER/>
</OTHERITEM>
<OTHERITEM id="2">
<DOC_DATE>2017-02-07</DOC_DATE>
<DOC_TYPE>K23</DOC_TYPE>
<DOC_NUMBER>154639123</DOC_NUMBER>
<REL_DOC_NUMBER/>
<PAY_DUE>2017-03-31</PAY_DUE>
<TOT_DUE>50.0</TOT_DUE>
<REVIEW/>
<BROKER/>
</OTHERITEM>
<OTHERITEM id="3">
<DOC_DATE>2017-03-31</DOC_DATE>
<DOC_TYPE>PMT</DOC_TYPE>
<DOC_NUMBER>20000001937</DOC_NUMBER>
<REL_DOC_NUMBER/>
<PAY_DUE>2017-03-31</PAY_DUE>
<TOT_DUE>-16950.0</TOT_DUE>
<REVIEW/>
<BROKER/>
</OTHERITEM>
<OTHERITEM id="4">
<DOC_DATE>2017-03-20</DOC_DATE>
<DOC_TYPE>B2-1, AR/CR</DOC_TYPE>
<DOC_NUMBER>10123999423123</DOC_NUMBER>
<REL_DOC_NUMBER>10123999787658</REL_DOC_NUMBER>
<PAY_DUE>2017-05-02</PAY_DUE>
<TOT_DUE>560.0</TOT_DUE>
<REVIEW/>
<BROKER/>
</OTHERITEM>
<OTHERITEM id="5">
<DOC_DATE>2017-02-06</DOC_DATE>
<DOC_TYPE>K32</DOC_TYPE>
<DOC_NUMBER>T1854379</DOC_NUMBER>
<REL_DOC_NUMBER/>
<PAY_DUE>2017-03-31</PAY_DUE>
<TOT_DUE>-230.0</TOT_DUE>
<REVIEW/>
<BROKER/>
</OTHERITEM>
<OTHERITEM id="6">
<DOC_DATE>2017-02-09</DOC_DATE>
<DOC_TYPE>B2-1, AP/CP</DOC_TYPE>
<DOC_NUMBER>392754952</DOC_NUMBER>
<REL_DOC_NUMBER/>
<PAY_DUE>2017-03-31</PAY_DUE>
<TOT_DUE>-200.0</TOT_DUE>
<REVIEW/>
<BROKER/>
</OTHERITEM>
</OTHERS>
</IMPORTER>
</DETAILS>
</ZDNOTICE>

DonBakke · April 2018

No, that didn't work but I edited it for you.

KevinFournier · April 2018

Simply adding the following parameter to your existing SRP_Extract_Xml should work:


Result = SRP_Extract_Xml(Xml, XPath, 'xmlns="http://www.cbsa-asfc.gc.ca/ARL/ZARL/2017-11"')

DonBakke · April 2018

Now that we can see the XML properly we can focus on the problem. In order to avoid repeating what you may have already tried, did you actually attempt to pass in the NameSpace data?

DonBakke · April 2018

Yeah...what Kevin suggested! :)

JimLeong · April 2018

Yes I did initially tried adding the NameSpace. I tried it again with your revised code and Kevin's suggestion. Result is still blank.

When I debugged the Xml, I noticed there are "-"'s there. I tried turning the UTF8 on and off with the same results.

When I tried this in OI's XML workspace , it did read it, but I am missing the namespace data.

KevinFournier · April 2018

Maybe something like this?

Result = SRP_Extract_Xml(Xml, '/s:ZDNOTICE', 'xmlns:s="http://www.cbsa-asfc.gc.ca/ARL/ZARL/2017-11"')

JimLeong · April 2018

Yep.. that worked! Thank You very Much!

SRP Extract Xml

Comments